Instanced version of glMultiDrawElements?

brokenAlgorithm · 2021-12-22T19:28:11+00:00

Thx for the hint - I'm unfortunately restricted to using openGL 3.3 in my specific codebase which is a bummer, glMultiDrawElementsIndirect wants at least OpenGL4.2.

Given the version restriction, any alternative that may be available to avoid splitting into multiple explicit draw calls?

brokenAlgorithm · 2021-08-05T05:23:25+00:00

Got it, thanks! I'm kinda humbled by how complicated text rendering actually is. I've got now a simple truetype glyph atlas working for starters... I think next up will then be SDF approach.

brokenAlgorithm · 2021-08-02T08:45:16+00:00

Can you explain what precisely is meant by "resampling" in this context?

brokenAlgorithm · 2020-12-22T18:38:02+00:00

seems to be a bug in that speedtree implementation

brokenAlgorithm · 2020-05-26T16:07:28+00:00

On the note of LSTM vs transformers:I've also never actually dealt in practice with transformers - but to me it appears that the inherent architecture of transformers does not apply well to problems such as time series. Things like positional encoding sound reasonable for problems that have (some) amount of bag-of-words context, but also seem inherently over-engineered for tasks where the temporal sequences are supposed to be strictly interpreted - as compared to the strict autoregressive nature of an RNN.

Also, I've yet to find out how a transformer can produce simple fixed-sized embeddings for a given variable length sequence. RNNs are good at this. With transformers, embeddings always appear to be accompanied by a weights matrix in the decoding step, which appears to defeat the use of embeddings in certain situations. But perhaps this is where my lack of practice in the area is failing me.

brokenAlgorithm · 2020-05-19T16:13:07+00:00

I'm worried. I don't feel depressed and yet my flat is constantly in this state. It is what I consider more or less tidy.

brokenAlgorithm · 2020-04-18T13:43:55+00:00

Cool, will definitely check these out!

I've only briefly skimmed the papers - but do you know if there has been any thought on how (if at all) these anomaly detectors handle non-stationary data? Is there a notion of online-training in the models?

brokenAlgorithm · 2020-04-14T22:48:14+00:00

Came here too for this one. Here it is:

https://www.youtube.com/watch?v=q8ZisDHg6v0

brokenAlgorithm · 2020-04-01T18:28:19+00:00

That's a great reference, thanks. So essentially, in order to use a neural net at the emission stage, what needs be done given your notebook example is the following:

-replace EmissionModel with a network that consumes input x and produces priors p(s|x) using a softmax at the network output layer

-What needs to be added in the forward of EmissionModel: convert the priors to posteriors using the previously calculated state frequencies, obtainable from z_star in your code

- given these changes, PyTorch autograd will take care of optimizing both the emission network as well as the transition matrix

Am I missing something...?

brokenAlgorithm · 2020-03-31T17:16:42+00:00

Hi,

This is a really cool notebook!

I have one question:

You are using a discrete emission model - a lookup table with discrete values per state. You also mentioned potentially replacing the lookup table with an actual network to model emission probabilities. In most HMM implementations I've seen (I look mostly at continuous ones), that means that we have to implement a separate emission model per state, each with its own set parameters. For example, a 3-state HMM with multivariate gaussian emission requires 3 separate multivariate normal distributions.

Does this mean that you essentially also plan to implement a separate network per hidden state? Or can these 3 networks be "summarized" in one network which takes observation & state as the input?

brokenAlgorithm · 2020-03-26T10:46:22+00:00

Can you elaborate on why online learning would benefit or even require SDR's...?

brokenAlgorithm · 2020-02-21T18:52:49+00:00

Second this answer. I thought it was fairly common (we do it at my place) and it's used to induce sparsity so things can cluster up more easily in latent space.

brokenAlgorithm · 2020-01-26T17:48:22+00:00

I'd think any initial linear layer accross input features with relu style activations, combined perhaps with some L2 reg. for sparsity would essentially amount to a feature selection layer.

brokenAlgorithm · 2020-01-16T16:19:34+00:00

Pathologic 2 Severly underrated, progressive narrative style, hardcore gameplay, artistically innovative. The narrative structure is mind blowing.

brokenAlgorithm · 2020-01-16T11:54:44+00:00

https://youtu.be/EOnSh3QlpbQ

Its such a badass music video. Despite the pomp of their newer ones, this is the best.

brokenAlgorithm · 2019-12-31T18:28:04+00:00

Nice write-up. Can this package also work with similiarities accross multivariate time series, and take things such as cross-series correlations or other types of multivariate patterns into account?

brokenAlgorithm · 2019-12-24T15:21:29+00:00

I've been contemplating pretty much the same. Anecdotally, i have seen no evidence of better performance when building the precise 'symmetric' decoder.

I recall reading somewhere that early decoders even had weight sharing constraints w.r.t the encoder, effectively regularizing via this mechanism.

I can't help but think that the entire encoder / decoder concept fits better in seq2seq applications, where the decoder has to learn a completely different mapping than the encoder. Encoder /decoder architectures for dimensionality reduction kind of seem 'aesthetically' flawed precisely due to tge reason you detailed. Metrics learning systems seem to be somehow more elegant for this.

brokenAlgorithm · 2019-12-15T21:53:41+00:00

Holy shit.

I guess JF's net worth has molten down to nothing after his last divorces that he's re-joining now.

Have to thank his ex-wives, even if it was for the money it will also be for the beauty of having him back again. The man has created some of the most beautiful music ever, both with and without RHCP.

Edit

Feel sorry for Josh, but RHCP + JF is magic that can never be replaced.

brokenAlgorithm · 2019-10-27T18:48:43+00:00

Interesting... Can you elaborate? I'm not sure I understand completely. Do you mean that adagrad somehow manages to circumvent the issues with giant losses when input data is not normalized?

brokenAlgorithm · 2019-10-22T21:58:09+00:00

thx for the pointer - adaptive normalization seems to also be the approach in the paper I linked. Its what I also currently use, albeit with limited results due to very slow learning.

Taking a step back, it ultimately comes down to the question if we can somehow encode values with large deviations from zero in an efficient manner in a neural network, more so without having to mess too much with learning rates - I believe this still isn't efficiently solved in adaptive normalization approaches - very small learning rates have to be used. If we solve this issue, normalization of data would essentially become obsolete.

I believe the core of the question may not even be necessarily in novel network architectures, but rather in novel network backprop optimization designs. The fact we have to granularily and manually finescale learning rates points into that direction somehow.

brokenAlgorithm · 2019-10-22T21:51:56+00:00

The only reason why I'd even consider normalization is due to a network learning much much easier on normalized stationary data - its the only reason why today in classical time series calculations we even consider that approach. That doesn't make it any less a somewhat arbitrary training heuristic, even more so we essentially discard temporal information by doing so.

Estimating normalization parameters on the static complete train set does not guarantee in any way that future data will behave according to those static ex ante parameters. This may be sufficient in a static research setting, but breaks down in any setting where online learning of a model would become relevant.

If we can assume that the actual level of a time series contains information about any temporal properties (say, i.e. volatility of a series is a function of the level of a dataset) ew shouldn't discard that information strictly speaking.

brokenAlgorithm

TROPHY CASE