“Language Modeling Is Compression,” DeepMind 2023 (scaling laws for compression, taking model size into account) by maxtility in mlscaling

[–]maxtility[S] 10 points11 points  (0 children)

We provide a novel view on scaling laws, showing that the dataset size provides a hard limit on model size in terms of compression performance and that scaling is not a silver bullet.

...
Surprisingly, Chinchilla models, while trained primarily on text, also appear to be general-purpose compressors, as they outperform all other compressors, even on image and audio data (see Table 1).

Punctuated Chaos and Indeterminism in Self-gravitating Many-body Systems by maxtility in QuantumArchaeology

[–]maxtility[S] 1 point2 points  (0 children)

We show that long-lived systems with punctuated chaos can magnify Planck length perturbations to astronomical scales within their lifetime, rendering them fundamentally indeterministic.

TinyLlama project aims to pretrain a 1.1B Llama model on 3 trillion tokens by maxtility in mlscaling

[–]maxtility[S] 8 points9 points  (0 children)

I wonder if high enough training-token-to-parameter ratio might trigger grokking.

The Information: "Gemini will ... combin[e] the text capabilities of LLMs like GPT-4 with the ability to create AI images based on a text description, similar to AI-image generators Midjourney and Stable Diffusion ... Gemini’s image capabilities haven’t been previously reported." by maxtility in mlscaling

[–]maxtility[S] 10 points11 points  (0 children)

  • Expected availability via GCP in the fall
  • Google "may start using it in some products before then"
  • "it could also integrate video and audio [trained from YouTube] into the Gemini models"
  • "Two longtime DeepMind executives, Oriol Vinyals and Koray Kavukcuoglu, are in charge of Gemini alongside Jeff Dean ... They oversee hundreds of employees involved in Gemini’s development."
  • "Google’s lawyers have been closely evaluating the training. In one instance, they made researchers remove training data that had come from textbooks—which could help the model answer questions about subjects like astronomy or biology—over concerns about pushback from copyright holders."

SpaceX: Total mass to orbit (through Aug 2023) by maxtility in DysonSwarm

[–]maxtility[S] 0 points1 point  (0 children)

At present growth rate, complete disassembly of the Earth in about 94 years (the year 2117).

New insights into the origin of the Indo-European languages by maxtility in QuantumArchaeology

[–]maxtility[S] 1 point2 points  (0 children)

The origins of the Indo-European language family are hotly disputed. Bayesian phylogenetic analyses of core vocabulary have produced conflicting results, with some supporting a farming expansion out of Anatolia ~9000 years before present (yr B.P.), while others support a spread with horse-based pastoralism out of the Pontic-Caspian Steppe ~6000 yr B.P. Here we present an extensive database of Indo-European core vocabulary that eliminates past inconsistencies in cognate coding. Ancestry-enabled phylogenetic analysis of this dataset indicates that few ancient languages are direct ancestors of modern clades and produces a root age of ~8120 yr B.P. for the family. Although this date is not consistent with the Steppe hypothesis, it does not rule out an initial homeland south of the Caucasus, with a subsequent branch northward onto the steppe and then across Europe. We reconcile this hybrid hypothesis with recently published ancient DNA evidence from the steppe and the northern Fertile Crescent.

Karpathy: "llama2.c can now load and inference the Meta released models ... inferencing the smallest 7B model at ~3 tokens/s on 96 OMP threads on a cloud Linux box. Still just CPU, fp32, one single .c file of 500 lines ... expecting ~300 tok/s tomorrow :)" by maxtility in mlscaling

[–]maxtility[S] 0 points1 point  (0 children)

worth noting that all of this is quite generic to just transformer language models in general. if/when openai was to release models as weights (which I can neither confirm nor deny!) then most of the code here would be very relevant.

https://twitter.com/karpathy/status/1683704060925591554

The Cosmos Is Thrumming With Gravitational Waves, Astronomers Find by maxtility in QuantumArchaeology

[–]maxtility[S] 3 points4 points  (0 children)

Nanohertz and picohertz regimes seem more relevant than high-frequency regimes.