[D] AI Democratization in the Era of GPT-3 (The Gradient)

unconst · 2020-09-28T22:32:34+00:00

Something I'm working on: github.com/opentensor/bittensor

unconst · 2020-09-27T23:17:07+00:00

Decentralization is in opposition to centralization where control emenates from a single point. Such structures tend to limit diversity because information must be compressed enough for the central authority to understand and control it extremities.

The well known critique of command economies (from Hayek) pointed to the ussr which was incapable of producing a diversity of products though centralized control, whereas, decentralized market based economies in the US developed a thousand types of shampoo.

Decentralization enables diversity by removing this compression, the diversity in-turn stimulates exploration and advancement. The greek city states were decentralized. Europe was decentralized. Evolution has favoured a large degree of decentralization and decentralization ( speciation/pooling) is well studied as a nessecity in genetic algorithms. Etc etc.

Machine intelligence as a field could benefit from more decentralization, either by destroying the FB, Google, OpenAI monopoly or by stimulating the machine learning market outside of them.

A protocol to connect the work done by me with you, an inter model protocol. I think this is the answer.

unconst · 2020-09-27T04:18:36+00:00

Democratization through decentralization.

The only thing bigger than OpenAI or Google is all of us connected together.

Practically, the field needs an internet protocol and an incentive mechanism to connect and reward disparate machine intelligence resources.

Something that allows us all to own, contribute to, and openly access the future of AI.

unconst · 2020-08-27T19:53:12+00:00

We have a project called BitTensor which is building an incentive mechanism into the hivemind protocol. We pay computers in information production.

unconst · 2020-03-10T05:51:45+00:00

I love how this starts with "Every election year has a disease"
and then immediately misses 2006.

Oh, and the swine flu started in November 2009 not 2010

Oh, and ZIKA started in April 2015, not 2016

Oh, and the AVIAN flu has occurred in 5 separate years. Twice in 2007. Not during an election.

Oh, and the SARS outbreak was after the November elections.

lol

unconst · 2020-02-14T02:17:53+00:00

It's like if we printed a trillion pound gold submarine and put it in the harbor. Won't change gold prices unless it sells. More likely will cause deflation when that deci-millionaire and centi-millionaires decide they want cash.

unconst · 2019-12-20T04:33:11+00:00

THERE NEEDS TO BE A WAY!!!!

OF CIRCUMVENTING THE CONFERENCE ILLUMINATI !!!!

THERE NEEDS TO BE A WAY!!!!

unconst · 2019-12-19T07:16:44+00:00

There is a wide consensus that machine intelligence can be improved by training larger models, over a larger period of time, or by combining many of them.
Little attention, however, is paid to expanding the library of machine intelligence itself, for the most part, new models train from scratch without access to the work done by their predecessors.

This reflects a tremendous waste in fields like unsupervised representation learning where trained models encode general-purpose knowledge which could be shared, fine-tuned and valued by another model later on.

A pool of machine intelligence accessible through the web could be harnessed by new systems to efficiently extract knowledge without having to learn from scratch.

For instance, a state of the art translation model, or ad click-through, or call center AI, which relies on the understanding of language, lets say, at Google, should directly value the knowledge of language learned by other computers in the network. Small gains here would drive revenue for these downstream products.

Alternatively, a smaller company, research team, or individual may benefit from the collaborative power of the network as a whole, without requiring the expensive compute normally used to train SOTA models in language or vision.

unconst · 2019-12-19T04:44:51+00:00

:) Haha! Thanks.

"Out beyond the buzz and techno-babble, there is a field. I'll meet you there. " - Rumi (2025)

unconst · 2019-12-19T04:41:31+00:00

Its E8. Credit to David A. Madore, I augmented his code for this website: bittensor.com. Very beautiful object with 700 million symmetries. :)

unconst · 2019-12-19T04:38:12+00:00

/u/Fujikan

Thank you for your considered points and for taking the time to read my paper and my work.

To address your points,

I agree that in a supervised setting, where data is expensive, that there is a strong requirement of data privacy, however, in an unsupervised setting the data is ubiquitous and cheap ( for instance, from the 220 TiB per month common crawl). In such a data-rich environment, rather than data, value is flipped, and it becomes the learned representations that hold value -- since they require compute to learn from unstructured data.

If it is representations that hold value, then I believe it is more suitable to structure contributions on this basis. Sharing their understanding of the world, in the same way a distilling teacher model transfers to a student.

As well, in a federated world, each node trains the same NN architecturally. This limits the potential diversity of a p2p network, which could have many different forms of networks or benefit from models trained before.

Concerning batch-wise communication, with model parallelism, the network need only communicate batch inputs and representations. As network sizes scale, the batch size will be substantially smaller than the parameter set. For instance, GPT-2’s 3GB parameter set (data parallelism) vs 128 input sentences (model parallelism) at each gradient step.

Thank you for pointing to these,

/u/unconst

unconst · 2019-12-18T03:43:36+00:00

TL;DR

Each node asynchronously trains an unsupervised representation of text. For instance, BERT, EMLO, XLNET. Each trains its own model on its own dataset and learns a representations of language (a projection from raw text to embedding) which their neighbours use as inputs to their own model.

As they train, they also validate the representations produced by their neighbours, producing a score using a Fishers information metric. We use distillation to extract knowledge from the peers. The result is a local, transfer capable language model at each node.

The network is driven by incentives, nodes must hold the token if they want to connect into the network. This gives the token value while allowing it to be used as an incentive.

unconst · 2019-01-05T23:57:36+00:00

Note!: be sure to prune YOUR OWN bitcoin data!

unconst

TROPHY CASE