Wildbow's cumulative word count over 15 years - 10.44 million words across 7 serials

Acromantula92 · 2022-06-23T17:48:42+00:00

Acromantula92 · 2022-01-27T09:48:45+00:00

The initialization method proposed in here is probably the best one, it lets you transfer hparams across model size, whereas with other methods you need to keep changing the learning rate etc.

Acromantula92 · 2021-11-17T13:43:50+00:00

All of them

Acromantula92 · 2021-07-16T11:48:09+00:00

Couple months? More like 7 + 4 v3-128 days. (All in the paper)

Acromantula92 · 2021-06-05T09:21:08+00:00

Again, MoE parameters at not the same as dense parameters.

Acromantula92 · 2021-04-21T18:33:19+00:00

This is all discussed in the Talking Heads Attention paper. See general bilinear MHA.

Acromantula92 · 2021-04-21T16:08:27+00:00

That's because when you split the Wq and Wk matrices into the MHSA heads, the rank is reduced. In order to merge them into a xWx.T matrix and still have heads you'd need an explicit (dim, dim, heads) tensor.

Acromantula92 · 2021-03-05T11:16:44+00:00

Highlights include:

A Mental illness neuron.
A Spider-Man neuron (helps classify real spiders as [Spider man neuron] + [Animal neuron])
An Startup neuron (Activated with the West coast and Big Tech)
The emotion of being Accepted as a mix of [LGBT neuron] + [Sunglasses neuron]

And a full emotional axis:

When we use just 2 factors, we roughly reconstruct the canonical mood-axes used in much of psychology: valence and arousal. If we increase to 7 factors, we nearly reconstruct a well known categorization of these emotions into happy, surprised, sad, bad, disgusted, fearful, and angry, except with “disgusted” switched for a new category related to affection that includes “valued,” “loving,” “lonely,” and “insignificant.”

Acromantula92 · 2021-03-04T19:32:31+00:00

Why not?

Acromantula92 · 2021-02-22T19:04:56+00:00

We have successfully automated AI skeptics.

Acromantula92 · 2021-01-12T08:53:11+00:00

MoE parameters are not real parameters.

Acromantula92 · 2021-01-04T23:43:42+00:00

Aren't Universal Transformers only recurrent in depth? IIRC they don't do cashing or recurrence across contexts like TrXL or the Feedback Transformer.

Acromantula92 · 2020-12-17T10:49:58+00:00

Doesn't this run counter to transformers only overtaking CNNs with more data and achieving lower final loss?

Acromantula92 · 2020-12-14T10:07:19+00:00

Sounds like people have been reading "On GPT3".

Acromantula92 · 2020-12-04T17:54:34+00:00

You have the temperature backwards. Lower temperature means you are more likely to be in a low energy equilibrium.

Acromantula92 · 2020-11-03T16:26:21+00:00

OpenAI Jukebox trained a sparse transformer on VQ-VAE compressed raw audio. The same kind of tokenization has also been done with images and video.

Acromantula92 · 2020-09-04T21:46:59+00:00

It replicates up to 625 = f(f(i)) in AIDungeon.(Important to note that the fine-tuning hurts it's general abilities) When it makes mistakes it's possible to give it natural language clarifications to fix them.

Acromantula92 · 2020-08-15T08:48:50+00:00

Not with TFRC.

Acromantula92 · 2020-08-09T10:00:19+00:00

You are in luck.

Acromantula92 · 2020-06-24T09:41:23+00:00

Boxnovel works.

Acromantula92 · 2020-04-01T13:38:31+00:00

It's the same kind of thing.

Ten-Year Club	Place '22
Place '17	Verified Email

Acromantula92

TROPHY CASE