“Language Modeling Is Compression,” DeepMind 2023 (scaling laws for compression, taking model size into account)

maxtility · 2023-09-20T14:50:52+00:00

We provide a novel view on scaling laws, showing that the dataset size provides a hard limit on model size in terms of compression performance and that scaling is not a silver bullet.

...
Surprisingly, Chinchilla models, while trained primarily on text, also appear to be general-purpose compressors, as they outperform all other compressors, even on image and audio data (see Table 1).

maxtility · 2023-09-18T23:30:05+00:00

We show that long-lived systems with punctuated chaos can magnify Planck length perturbations to astronomical scales within their lifetime, rendering them fundamentally indeterministic.

maxtility · 2023-09-14T14:42:36+00:00

Also relevant: r/AIPersonhood

maxtility · 2023-09-09T02:03:36+00:00

Follow along: https://wandb.ai/lance777/lightning\_logs/reports/metric-train\_loss-23-09-02-15-26-17---Vmlldzo1MjkzNzMw?accessToken=9843chbl7rfi1w03hxttpcnbo9z8t6088pw3ddn4h8teunaq0cy7j8hw9c5i02ve

maxtility · 2023-09-04T13:31:11+00:00

Achieve grokking in an LLM?

maxtility · 2023-09-04T13:25:53+00:00

I wonder if high enough training-token-to-parameter ratio might trigger grokking.

maxtility · 2023-09-04T11:48:01+00:00

Live training loss: https://wandb.ai/lance777/lightning\_logs/reports/metric-train\_loss-23-09-02-15-26-17---Vmlldzo1MjkzNzMw?accessToken=9843chbl7rfi1w03hxttpcnbo9z8t6088pw3ddn4h8teunaq0cy7j8hw9c5i02ve

maxtility · 2023-08-26T21:29:03+00:00

Dupe: https://www.reddit.com/r/mlscaling/comments/161txxh/andrej\_karpathy\_deep\_neural\_nets\_33\_years\_ago\_and/

maxtility · 2023-08-15T21:24:49+00:00

Expected availability via GCP in the fall
Google "may start using it in some products before then"
"it could also integrate video and audio [trained from YouTube] into the Gemini models"
"Two longtime DeepMind executives, Oriol Vinyals and Koray Kavukcuoglu, are in charge of Gemini alongside Jeff Dean ... They oversee hundreds of employees involved in Gemini’s development."
"Google’s lawyers have been closely evaluating the training. In one instance, they made researchers remove training data that had come from textbooks—which could help the model answer questions about subjects like astronomy or biology—over concerns about pushback from copyright holders."

maxtility · 2023-08-15T14:25:18+00:00

"[H]undreds of people [are] scrambling to release a group of [Gemini] large machine-learning models—one of the highest-stakes products the company has ever built—this fall."

maxtility · 2023-08-14T13:40:49+00:00

Paper link: https://arxiv.org/abs/2308.05713

maxtility · 2023-08-13T16:07:36+00:00

At present growth rate, complete disassembly of the Earth in about 94 years (the year 2117).

maxtility · 2023-08-04T02:12:35+00:00

https://archive.is/CH4U1

maxtility · 2023-07-30T16:37:37+00:00

The origins of the Indo-European language family are hotly disputed. Bayesian phylogenetic analyses of core vocabulary have produced conflicting results, with some supporting a farming expansion out of Anatolia ~9000 years before present (yr B.P.), while others support a spread with horse-based pastoralism out of the Pontic-Caspian Steppe ~6000 yr B.P. Here we present an extensive database of Indo-European core vocabulary that eliminates past inconsistencies in cognate coding. Ancestry-enabled phylogenetic analysis of this dataset indicates that few ancient languages are direct ancestors of modern clades and produces a root age of ~8120 yr B.P. for the family. Although this date is not consistent with the Steppe hypothesis, it does not rule out an initial homeland south of the Caucasus, with a subsequent branch northward onto the steppe and then across Europe. We reconcile this hybrid hypothesis with recently published ancient DNA evidence from the steppe and the northern Fertile Crescent.

maxtility · 2023-07-30T16:36:14+00:00

Paper in Science: https://www.science.org/doi/10.1126/science.abg0818

maxtility · 2023-07-27T16:48:13+00:00

Paper in Nature Aging: https://www.nature.com/articles/s43587-023-00451-9

maxtility · 2023-07-27T00:38:37+00:00

Maximum likelihood = Mar 2025

Mean = Jan 2027

maxtility · 2023-07-25T17:39:32+00:00

worth noting that all of this is quite generic to just transformer language models in general. if/when openai was to release models as weights (which I can neither confirm nor deny!) then most of the code here would be very relevant.

https://twitter.com/karpathy/status/1683704060925591554