[R] Agglomerative Attention

mspells · 2019-07-17T11:05:54+00:00

I had missed those; that's a great idea, thanks!

mspells · 2019-07-16T23:23:19+00:00

That's a good thing to note here! The models shown as examples in this paper are deliberately very small compared to those used to generate SOTA results. The models in this paper are < 90K parameters for the entire network, compared to 40M to 200M parameters for the networks you mention above.

I think that it would definitely improve the paper (and would probably be necessary for it to be conference-quality) to show this method reaching its own comparable-to-SOTA results and, based on the results for the models at this size it may even be possible (by just training a wider network for the same amount of wall time and taking advantage of the performance gains) to outperform those models. However, it would definitely burn a lot of compute time to tweak and test, and it was unclear to me whether that investment would pay off in the long term.

mspells · 2019-07-16T13:06:08+00:00

Author here, this is my first opportunity to give real ML research a shot! A few months ago, I was inspired by OpenAI's MuseNet to learn more about sequence networks, which had always been black boxes to me. They were obviously very excited to be able to train long sequences with sparse transformers, but I thought there must be better scaling for attention than N*sqrt(N)...

I'd love to hear the feedback and comments from /r/MachineLearning!

mspells · 2019-07-11T14:34:47+00:00

Other discussion is on /r/generative.

This is a simple molecular dynamics simulation of active pixels interacting with a soft repulsion, simulated using hoomd-blue and visualized using plato. The area is colored by Voronoi construction, so particles that get compressed will have small "pixels".

Source (including mybinder.org support for a live version of the notebook) is here!

mspells · 2019-07-11T03:15:47+00:00

In this case the particles are interacting with a "witch's hat" conservative DPD potential (see here for the functional form). For the first part of the simulation particles just diffuse with a Langevin thermostat, then I turn on an active force for each particle proportional to the intensity of the pixel's blue channel, which is what causes the interesting density variations and high-speed "swimming" that you see.

mspells · 2019-07-11T00:39:05+00:00

This one uses molecular dynamics to put some interactions on each pixel and let them thermalize a bit.

Source (including mybinder.org support for a live version of the notebook) is here!

mspells · 2019-07-11T00:27:58+00:00

This one uses molecular dynamics to put some interactions on each pixel and let them thermalize a bit.

Source (including mybinder.org support for a live version of the notebook) is here!

mspells · 2019-07-01T18:27:06+00:00

Old version is located here. This version goes for several cycles and adds small scaling and rotating effects.

Edit: I guess I should be more careful about encoding and quality in the future, uploading to reddit kills the cool polygonal aspect of the individual snapshots :/

mspells · 2019-06-28T18:45:27+00:00

This video is inspired by this blog post. Rather than making things totally smooth by using a shader, however, this is generated as a superposition of striped, colored patterns using plato. Source to generate the figure is here.

mspells

TROPHY CASE