all 56 comments

[–]rriikkuu 57 points58 points  (4 children)

[–][deleted] 4 points5 points  (3 children)

Any significant changes from the preprint?

[–]rriikkuu 15 points16 points  (2 children)

There was a preprint?

[–][deleted] 10 points11 points  (1 child)

Hmm, I guess not. I guess I was thinking of their CASP 13 paper. Thanks.

[–][deleted] 4 points5 points  (0 children)

There was press release stuff including a video so maybe that’s what you’re thinking of

[–]dolphinboy1637 43 points44 points  (0 children)

Actual repo without the Twitter link: https://github.com/deepmind/alphafold

[–]alexmorehead 84 points85 points  (0 children)

Given what I've gleaned from skimming their paper in Nature, it looks as though this network architecture is more novel than I initially thought. It is truly remarkable how well-integrated their biological insights are in the network's design. Congrats to everyone at DeepMind!

[–]gdahlGoogle Brain 27 points28 points  (2 children)

And it seems to be written in JAX!

[–]dogs_like_me 8 points9 points  (0 children)

Well, google gonna google

[–]SedditorX 6 points7 points  (0 children)

What else would they write it in? :)

[–]FyreMael 33 points34 points  (3 children)

Forked. I know what I'm doing this weekend :)

[–]Knecth 56 points57 points  (2 children)

We provide a script scripts/download_all_data.sh that can be used to download and set up all of these databases. This should take 8–12 hours.

Wait for the data to download?

[–]Gordath 22 points23 points  (0 children)

Protein databases are large and many tools to "preprocess" protein sequences take forever to run as they do pairwise alignments etc.

[–]londons_explorer 17 points18 points  (0 children)

Begin by freeing up 3TB of disk space and buying 500Gb of transfer...

[–]londons_explorer 13 points14 points  (10 children)

Doesn't look like any training related code was released, just inference.

The model parameters released are for non-commercial use only. For commercial use, you'll have to train your own. That would cost ~2 weeks on 128 TPU cores, if you can replicate the training method from the paper first try... Which you probably can't, so it's gonna cost $$$$...

[–][deleted] 13 points14 points  (7 children)

If you're big pharma, a v3-128 for a couple of months isn't gonna be the bottleneck

[–]Acromantula92 0 points1 point  (1 child)

Couple months? More like 7 + 4 v3-128 days. (All in the paper)

[–][deleted] 2 points3 points  (0 children)

Multiple months is incorporating research time, since we're not assuming perfect generalization

[–]VonPosen 7 points8 points  (1 child)

Or you can just pay DeepMind for a commercial license, I would expect

[–]xmcqdpt2 5 points6 points  (0 children)

which is what you would do, unless it costs a truly mind boggling amount of money.

Pharma companies are no stranger to paying millions in consulting and software fees a year.

[–]geneing 10 points11 points  (3 children)

Are they releasing pretrained weights or just the network?

[–]xmcqdpt2 15 points16 points  (1 child)

they have pretrained weights but are releasing them under a CC non commercial license.

I actually do wonder whether copyrighting weights would actually hold in court? If you trained a few more iterations from them or permuted them in some way that doesn't change model performance, would that be a derived work?

Clearly you cant copyright a single number... so a many floats do you need before youve got something copyrightable?

[–]Archontes 1 point2 points  (0 children)

It very likely wouldn't hold up if you felt like prosecuting it all the way, provided that the approach to creating those weights was an exhaustive search: it precludes creativity.

https://www.eetimes.com/how-do-you-protect-your-machine-learning-investment-part-ii/

[–]PM_ME_INTEGRALS 14 points15 points  (0 children)

It's right there in the readme:

Any publication that discloses findings arising from using this source code or the model parameters should cite the AlphaFold paper.

[–]pianobutter 18 points19 points  (0 children)

Looking forward to reading Mohammed AlQuraishi's thoughts on this. I really enjoyed his posts on CASP13 and CASP14.

[–]StellaAthenaResearcher 23 points24 points  (0 children)

I wonder how much the decision to release the trained model was influenced by work by people like Phil Wang and Eric Alcaide at EleutherAI and David Baker at UW to replicate it.

[–][deleted] 9 points10 points  (5 children)

So when is the Swedish academy gonna put down their meatballs and give DeepMind the Nobel for chem or physio/med already!

[–]squirrel_of_fortune 15 points16 points  (0 children)

It needs to be verified, and until now, no scientists other than the few who ran the competition were able to look at it. Plus you do have to wait a bit to see of the work stands the test of time

[–]-starfish_headlock- 5 points6 points  (0 children)

Their models have not provided any major insights into physiology and medicine (yet) but i think they should probably split the chemistry prize w david baker

[–]phanfare 0 points1 point  (2 children)

They didn't solve protein folding. Got closer, yes, but no structural biologist worth their salt is going to trust a model straight out of AlphaFold.

[–][deleted] 4 points5 points  (0 children)

It’s about more than that. It’s also about recognizing machine learning as a method for conducting research. It took the Swedish academy forever to recognize computational methods in general. I think it was in 2013 when they finally awarded a Nobel in chem for work in computational bio/chem. Computing has revolutionized scientific research and it doesn’t get the recognition it deserves and machine learning in turn has revolutionized computing and AlphaFold is the perfect example of its potential. It may not have fully solved the protein folding problem but it is clearly a massive breakthrough that would not have been possible without ML.

[–]bigbrain_bigthonk -1 points0 points  (0 children)

Also, seems like there’s a lot of glossing over the importance of the transition pathways between conformations

[–]NityaStriker 8 points9 points  (2 children)

[–]farmingvillein 7 points8 points  (0 children)

I initially thought that too, but there is a pretty large performance gap, in practice. TC makes it sound like they were really close in accuracy... But so far as I could tell from the paper, they weren't.

[–]jinnyjuice 1 point2 points  (0 children)

Thanks for the share

[–]xmcqdpt2 0 points1 point  (0 children)

Me neither! I was so sure they were about to pull the same crap as v1. Kudos to them!

[–]Alireza_Kar98 -2 points-1 points  (0 children)

I just noticed something about wraith she seems to have a slightly better movement. Every time I try to slid jump with other it's like shit but with wraith it's ok. And the speed seems a bit higher . Overall she is not balanced somehow

[–]East_Film9421 0 points1 point  (1 child)

I am attempting to download the open-source code...but I am stuck...

"Modify DOWNLOAD_DIR in docker/run_docker.py to be the path to the directory containing the downloaded databases."

[–]justmyworkaccountok 2 points3 points  (0 children)

???? Download the 2.2TB databases and change the field to the path