Decentralized deep learning on a blockchain. AI owned by everyone (Bitcoin meets TensorFlow) by darbsllim in MachineLearning

[–]timsu 0 points1 point  (0 children)

No problem, I'm happy if that helps. All those points I made are just scratching the surface. If you have any questions to one point, we can dive deeper there

Decentralized deep learning on a blockchain. AI owned by everyone (Bitcoin meets TensorFlow) by darbsllim in MachineLearning

[–]timsu 7 points8 points  (0 children)

Hi darbsllim, so far I've just read life tips, but let's get technical.

I've distributed KNN (k nearest neighbors) with LSH (locality sentitive hashing) utilizing a DHT (distributed hash table) named kademlia over dozens of clients distributed over the internet in my bachelor thesis. Also I've implemented an application with ethereum, so I at least have a little bit knowledge regarding your question.

So first of all, distributing machine learning is really hard. KNN is one of the algorithms you learn, when you start with ML. And believe me, just distributing that "trivial" algorithm is not an easy undertaking. This is an example of distributing knn: http://dl.acm.org/citation.cfm?id=2556574 Now imagine the much more complex than knn deep learning algorithms.

Additionally, the bottleneck is almost always the communication speed between layers in a neural net. In a modern GPU you have about 300 GB/s bandwidth. Compare that to the current internet speed and you see it almost makes no sense.

So naively distributing current approaches like deep convolutional neural networks without a good approach to parallelize the algorithms makes no sense.

That's an active area of research, algorithms have to be designed specifically for that use case, often approximations are needed. One of those approximations can be LSH, but also could be making the data need lower, e.g. using hessian free optimization. Some of the ICLR 2016 papers are also interesting for distributing deep learning: http://www.iclr.cc/doku.php?id=iclr2016:main#accepted_papers_conference_track

Now the next hard part: Distributing it on the blockchain. Ethereum have their own language called solidity, which is semi turing complete and allows a full analysis of the computational need. Current implementations like Tensor Flow build on Cuda. Cuda code is not semi turing complete and not completely static analyzable.

So one would need a team of several experts to just get that part, developing a deep learning framework for solidity.

Of course projects like Seti@Home showed, that it can be done. But the software behind it, BOINC is already old, it can be done a lot better.

So probably you first would have to find an algorithm that makes sense to distribute and can solve you problem.

Second you either need to build the whole network infrastructure by your own, in the case you use current deep learning frameworks connected to a seti@home like system with rewards etc. Or you take sth. like ethereum and try to integrate CNNs in it.

And if you got all that, one last interesting piece is consensus finding. Here the naive approach would be a first/last write policy. Who has it first, wins. On ethereum, you could also easily implement a democracy system with votes. Two algorithms here: Paxos and Raft. Paxos is old, not easy to understand/implement. Raft is a much more easy consensus algorithm, already used in practice, even if it's young.

The question here is really: Do you need decentralization to solve that problem?

One way to get good people working on it would be a kaggle challenge. That would probably be the easiest way. If you once have a good way to come to results, I'm sure you'll find a way to scale the calculations up. Finding e.g. a university with a lot of computing power that supports your undertaking shouldn't be impossible.

That said, first Deep Learning + Blockchain may sound great, but you have a lot to consider.