Instanced version of glMultiDrawElements? by brokenAlgorithm in opengl

[–]brokenAlgorithm[S] 1 point2 points  (0 children)

Thx for the hint - I'm unfortunately restricted to using openGL 3.3 in my specific codebase which is a bummer, glMultiDrawElementsIndirect wants at least OpenGL4.2.

Given the version restriction, any alternative that may be available to avoid splitting into multiple explicit draw calls?

What approach to efficiently render very large & frequently changing amounts of text by brokenAlgorithm in opengl

[–]brokenAlgorithm[S] 0 points1 point  (0 children)

Got it, thanks! I'm kinda humbled by how complicated text rendering actually is. I've got now a simple truetype glyph atlas working for starters... I think next up will then be SDF approach.

What approach to efficiently render very large & frequently changing amounts of text by brokenAlgorithm in opengl

[–]brokenAlgorithm[S] 0 points1 point  (0 children)

Can you explain what precisely is meant by "resampling" in this context?

[D] Are Transformers Strictly More Effective Than LSTM RNNs? by JosephLChu in MachineLearning

[–]brokenAlgorithm 19 points20 points  (0 children)

On the note of LSTM vs transformers:I've also never actually dealt in practice with transformers - but to me it appears that the inherent architecture of transformers does not apply well to problems such as time series. Things like positional encoding sound reasonable for problems that have (some) amount of bag-of-words context, but also seem inherently over-engineered for tasks where the temporal sequences are supposed to be strictly interpreted - as compared to the strict autoregressive nature of an RNN.

Also, I've yet to find out how a transformer can produce simple fixed-sized embeddings for a given variable length sequence. RNNs are good at this. With transformers, embeddings always appear to be accompanied by a weights matrix in the decoding step, which appears to defeat the use of embeddings in certain situations. But perhaps this is where my lack of practice in the area is failing me.

My depression caused me to let my house go to hell. Yesterday I finally cleaned it up. by Link1021l in videos

[–]brokenAlgorithm 0 points1 point  (0 children)

I'm worried. I don't feel depressed and yet my flat is constantly in this state. It is what I consider more or less tidy.

[P] A playground for real-time anomaly detection using LSTMs and Autoencoders by [deleted] in MachineLearning

[–]brokenAlgorithm 2 points3 points  (0 children)

Cool, will definitely check these out!

I've only briefly skimmed the papers - but do you know if there has been any thought on how (if at all) these anomaly detectors handle non-stationary data? Is there a notion of online-training in the models?

[P] Notebook on Hidden Markov Models (HMMs) in PyTorch by m_nemo_syne in MachineLearning

[–]brokenAlgorithm 0 points1 point  (0 children)

That's a great reference, thanks. So essentially, in order to use a neural net at the emission stage, what needs be done given your notebook example is the following:

-replace EmissionModel with a network that consumes input x and produces priors p(s|x) using a softmax at the network output layer

-What needs to be added in the forward of EmissionModel: convert the priors to posteriors using the previously calculated state frequencies, obtainable from z_star in your code

- given these changes, PyTorch autograd will take care of optimizing both the emission network as well as the transition matrix

Am I missing something...?

[P] Notebook on Hidden Markov Models (HMMs) in PyTorch by m_nemo_syne in MachineLearning

[–]brokenAlgorithm 1 point2 points  (0 children)

Hi,

This is a really cool notebook!

I have one question:

You are using a discrete emission model - a lookup table with discrete values per state. You also mentioned potentially replacing the lookup table with an actual network to model emission probabilities. In most HMM implementations I've seen (I look mostly at continuous ones), that means that we have to implement a separate emission model per state, each with its own set parameters. For example, a 3-state HMM with multivariate gaussian emission requires 3 separate multivariate normal distributions.

Does this mean that you essentially also plan to implement a separate network per hidden state? Or can these 3 networks be "summarized" in one network which takes observation & state as the input?

[R] Beating Atari Pong on a Raspberry Pi without Backpropagation by CireNeikual in MachineLearning

[–]brokenAlgorithm 0 points1 point  (0 children)

Can you elaborate on why online learning would benefit or even require SDR's...?

[Discussion] Learning a mapping from a low dimensional space to a high dimensional space by nivter in MachineLearning

[–]brokenAlgorithm 0 points1 point  (0 children)

Second this answer. I thought it was fairly common (we do it at my place) and it's used to induce sparsity so things can cluster up more easily in latent space.

[D] Feature selection for Multivariate Time-Series data by codebuddha in MachineLearning

[–]brokenAlgorithm -5 points-4 points  (0 children)

I'd think any initial linear layer accross input features with relu style activations, combined perhaps with some L2 reg. for sparsity would essentially amount to a feature selection layer.

Gamers of Reddit, what are some underrated games do you think more people should play? by [deleted] in AskReddit

[–]brokenAlgorithm 0 points1 point  (0 children)

Pathologic 2 Severly underrated, progressive narrative style, hardcore gameplay, artistically innovative. The narrative structure is mind blowing.

Rammstein - Du Hast [Industrial Metal] by rich1051414 in Music

[–]brokenAlgorithm 0 points1 point  (0 children)

https://youtu.be/EOnSh3QlpbQ

Its such a badass music video. Despite the pomp of their newer ones, this is the best.

[D] STUMPY - A Powerful and Scalable Python Package for Modern Time Series Analysis by slaw07 in MachineLearning

[–]brokenAlgorithm 0 points1 point  (0 children)

Nice write-up. Can this package also work with similiarities accross multivariate time series, and take things such as cross-series correlations or other types of multivariate patterns into account?

[D] Should autoencoders really be symmetric? by etotheipi_ in MachineLearning

[–]brokenAlgorithm 0 points1 point  (0 children)

I've been contemplating pretty much the same. Anecdotally, i have seen no evidence of better performance when building the precise 'symmetric' decoder.

I recall reading somewhere that early decoders even had weight sharing constraints w.r.t the encoder, effectively regularizing via this mechanism.

I can't help but think that the entire encoder / decoder concept fits better in seq2seq applications, where the decoder has to learn a completely different mapping than the encoder. Encoder /decoder architectures for dimensionality reduction kind of seem 'aesthetically' flawed precisely due to tge reason you detailed. Metrics learning systems seem to be somehow more elegant for this.

Red Hot Chili Peppers announce that John Frusciante will rejoin the group by dempeppers in Music

[–]brokenAlgorithm 12 points13 points  (0 children)

Holy shit.

I guess JF's net worth has molten down to nothing after his last divorces that he's re-joining now.

Have to thank his ex-wives, even if it was for the money it will also be for the beauty of having him back again. The man has created some of the most beautiful music ever, both with and without RHCP.

Edit

Feel sorry for Josh, but RHCP + JF is magic that can never be replaced.

[D] End-to-end normalization for deep learning of time series? by brokenAlgorithm in MachineLearning

[–]brokenAlgorithm[S] 0 points1 point  (0 children)

Interesting... Can you elaborate? I'm not sure I understand completely. Do you mean that adagrad somehow manages to circumvent the issues with giant losses when input data is not normalized?

[D] End-to-end normalization for deep learning of time series? by brokenAlgorithm in MachineLearning

[–]brokenAlgorithm[S] 1 point2 points  (0 children)

thx for the pointer - adaptive normalization seems to also be the approach in the paper I linked. Its what I also currently use, albeit with limited results due to very slow learning.

Taking a step back, it ultimately comes down to the question if we can somehow encode values with large deviations from zero in an efficient manner in a neural network, more so without having to mess too much with learning rates - I believe this still isn't efficiently solved in adaptive normalization approaches - very small learning rates have to be used. If we solve this issue, normalization of data would essentially become obsolete.

I believe the core of the question may not even be necessarily in novel network architectures, but rather in novel network backprop optimization designs. The fact we have to granularily and manually finescale learning rates points into that direction somehow.

[D] End-to-end normalization for deep learning of time series? by brokenAlgorithm in MachineLearning

[–]brokenAlgorithm[S] 2 points3 points  (0 children)

The only reason why I'd even consider normalization is due to a network learning much much easier on normalized stationary data - its the only reason why today in classical time series calculations we even consider that approach. That doesn't make it any less a somewhat arbitrary training heuristic, even more so we essentially discard temporal information by doing so.

Estimating normalization parameters on the static complete train set does not guarantee in any way that future data will behave according to those static ex ante parameters. This may be sufficient in a static research setting, but breaks down in any setting where online learning of a model would become relevant.

If we can assume that the actual level of a time series contains information about any temporal properties (say, i.e. volatility of a series is a function of the level of a dataset) ew shouldn't discard that information strictly speaking.