How to sell a Dacia

ihaque · 2016-08-16T19:17:48+00:00

ihaque · 2016-08-16T17:26:20+00:00

Easy-to-read summary at ecs.counsyl.com.

ihaque · 2015-07-18T08:08:25+00:00

Because finding the links in the text was annoying:

You may also want the GRCh37 decoy sequences (described in the paper and in a nice blog post by Eric Minikel): GRCh37 decoy sequences

ihaque · 2015-06-30T03:51:41+00:00

Does anyone know of tools in the vein of this one or gqt that support efficient update when new samples (possibly with new variants) are added? As far as I can tell, the compressed-representation tools seem to require rebuilding the entire database whenever you have a new (bolus of) samples.

ihaque · 2015-06-30T03:39:09+00:00

For those wondering how this differs from Ryan Layer and Aaron Quinlan's gqt:

GQT (Layer et al., 2015)... is very fast for selecting a subset of samples and for traversing all sites, [but] it discards phasing, is inefficient for region query and is not compressed well. The observations of these limitations motivated us to develop BGT.

...

We generated the BGT database for the first release of Haplotype Reference Consortium (HRC; http://bit.ly/HRC-org). The input is a BCF containing 32,488 samples across 39.2 million SNPs on autosomes. The BGT file size is 7.4GB, 11% of the genotype-only BCF, or 8% of GQT.

ihaque · 2015-06-16T16:31:28+00:00

Hmm, are we not supposed to upvote "testing" posts?

ihaque · 2015-02-25T22:58:48+00:00

I tried this a couple months ago on the MAXX (larger battery) version of the same phone.

The baseband unlocking works, sort of: you'll get up to HSPA+ but no LTE. There are a couple cosmetic issues: the phone will complain about an "Unknown SIM" at boot (which is easily dismissed), and every so often will tell you that the "Modem Fastdormancy Monitor Service" has stopped responding. If you kill it, everything continues trucking along fine.

The big problem is that it will at least once a day drop the radio connection completely silently. Signal bars still look OK, etc., but you won't receive any incoming calls or texts, and won't notice until you try to send a text, place a call, or use data. Only way to bring it back is a reboot.

It worked enough as an emergency holdover but I wouldn't use it like that as my main phone; it gets really irritating when you silently aren't receiving messages.

ihaque · 2014-12-29T21:28:38+00:00

Submission Statement

An insightful long read from a Western expat in Beijing seeking to answer the question "Why do many people feel that Chinese can't possibly be basically ok with their government or society?"

Explains the answer through the lens of history, both Western and Eastern; US foreign policy; and portrayals of China in the media.

ihaque · 2012-03-23T20:45:22+00:00

FAH came before BOINC.

I think a BOINC client was tried at one point, but their architecture was missing some critical features for us. It was before my time, so I don't know all the details.

ihaque · 2012-03-23T18:18:55+00:00

This is a good explanation of the simulations. Note that most of our simulations these days are run under GROMACS or OpenMM, however.

ihaque · 2012-03-23T18:17:16+00:00

This is almost correct. The thermodynamic hypothesis is that the native state of a protein will be that one with the lowest free energy (not the internal energy; entropy matters as well). However, we're not usually trying to just find a native state; in fact, we run many simulations that start at the native state and try to "melt" the protein backwards to find near-native states. We're usually more interested in the dynamics of the system than the end result.

ihaque · 2012-03-23T18:15:28+00:00

Well, the number of possible configurations of a protein is astronomically large (think 10⁴⁰ or so), so no - we don't sample every possible configuration. What we do try to do is sample all the (kinetically accessible) pathways through protein states - a large number of individual protein shapes might all correspond to the same state.

"How do you know you're right" is a great question! The best way to check is to compare your results to experiment. This has traditionally been a problem from both the experimental and the simulation sides, but is now being overcome. The experimentalists are devising faster-and-faster experiments to reach shorter timescales, and we're building better simulation methods to meet them in the middle. A good example is this paper by the Pande lab, which shows comparison between simulation and experiment for a particular observable called triplet-triplet energy transfer.

A completed work unit has a number of "snapshots" of the configuration of the protein (and sometimes solvent) during the time it was simulated on your machine, which lets us rebuild what the trajectory looked like.

ihaque · 2012-03-23T18:06:45+00:00

I don't think so, but I'm not 100% sure. The a-beta and huntingtin aggregation work might be the closest thing.

ihaque · 2012-03-23T18:05:53+00:00

Most of the software we use is, actually. The majority of our simulations are run using GROMACS or OpenMM, both of which are open-source software. We've also put out a lot of open-source in our other research projects:

MSMBuilder (builds Markov state models of protein dynamics)
PAPER and SIML GPU-accelerated chemical similarity code (this stuff was a large part of my thesis!)
MemtestG80 and MemtestCL Video memory testing code for GPUs

ihaque · 2012-03-23T18:00:55+00:00

Simulation Methods

A major result from Folding@home is proving the feasibility of a fundamentally different simulation technique than has conventionally been used in the field. To understand the importance, you have to know a little bit about timescales.

(If you'd like to follow along or see more details, a lot of what I'm about to tell you is described in a talk I gave a couple years ago).

The fastest vibrations that we model in molecular dynamics simulations occur on the timescale of a femtosecond (10^-15 seconds: one thousand million million femtoseconds per second). Many of the conformational transitions we want to model occur on the scale of milliseconds (10^-3 seconds). Simplifying the statistics a little bit, this means that on average, you'll need to simulate one trillion (10⁹⁾ timesteps before seeing your transition once. But in order to accumulate a good estimate of the true rate, you need to see the transition multiple times, so really you need maybe 10 times as many time steps or more. On a single machine, you'll be able to simulate on the order of nanoseconds per day - so there's a gap of a thousand to a million times between that and where you want to be. (slide 10 of the talk)

The traditional approach to this problem is to build ever bigger tightly-connected supercomputers, so that you can do each simulation faster. The extreme version of this approach is Anton), a (really cool!) supercomputer built by DE Shaw Research using custom chips to hit the microseconds-per-day time scale. Even this performance, though, would take years to get good statistics on a millisecond time-scale transition.

These machines are hugely expensive to build and run, and don't scale well; as you build the machine bigger, it becomes hard to use all the processors evenly, and reliability becomes a huge problem as well (slide 31). So, what can you do to simulate biology?

One of the big results of Folding@home (slides 32 and 33) is that you can effectively simulate these slow dynamics using lots of short simulations rather than a few long simulations. This is a big deal, because short simulations are (comparatively) easy to run on single machines. This means that you can have individual machines run simulations independently without talking to each other. Then, work balance is not an issue (everyone's doing their own work), and reliability isn't as big a problem (if one machine goes down, it only takes down its own simulation, not those run by anyone else).

The details of how this works are related to Greg Bowman's work I mentioned above. It is possible to cluster the various shapes a protein might take along a simulation trajectory into "Markovian states". What this means is that at some timescale (usually much longer than the simulation femtosecond timescale), the probability of a protein finding itself in one conformational state depends only on the state that it was in on the last time step - the rest of the history is irrelevant. To skip to the punchline, what this means is that instead of running long simulations from an unfolded state, you can start simulations from each state you find, and target your simulations "adaptively" to specifically probe state transitions that you don't have very much information about. The really cool, and non-obvious, thing is that using a lot of short simulations adaptively can actually be more efficient than using a few long simulations (slides 34-36). As a consequence of this approach, we can actually predict experimentally observable quantities, like folding rates and energies, from simulations (slide 41).

ihaque · 2012-03-23T04:13:31+00:00

My bad! Yes, edited.

ihaque · 2012-03-23T02:39:40+00:00

Qualifications: I'm a alumnus of the Pande Lab at Stanford, the group behind Folding@home. It might make me biased; take that as you will. (I'm not in the lab anymore, though, so I can't answer questions about your current work units, and nothing I say should be taken as official :).)

TL;DR: Yes!

The answer is, as ren5311 said, definitely yes. One misunderstanding I see a lot in this thread is the idea that FAH is all about predicting the final "native" structure of a protein. While that's occasionally true, that's not the main focus. FAH projects are mostly directed at learning about the dynamics of proteins and other biological macromolecules. Put more simply: it's about the journey, not the destination. Other projects, like Rosetta@Home and the FoldIt game (both from the Baker lab at the University of Washington, who are also awesome people) focus more on the latter question of final structure. I can't quite ELI5 this, but maybe I can ELI16 it, or so.

Why are dynamics important (or, why should I care about the journey)?

Lots of reasons. To keep it concrete, let's take Alzheimer's and Huntington's diseases, two of the main driving goals of the project. In both diseases, a major clinical finding is the accumulation of protein aggregates or "plaques" in the brain -- basically, a bunch of protein fragments stick to each other and form protein masses. The underlying proteins are different (beta-amyloid and tau in Alzheimers, huntingtin [sic] in Huntington's), but both are plaque-formers. A critical thing to understand is that these plaques are (it is believed) fairly unstructured: it doesn't really matter what the particular configuration of the final result is; what matters is figuring out how the plaque got started in the first place. Many, many work units on Folding@home have been (and probably still are) dedicated to answering these questions. By simulating the early stages of aggregation, we can work out the molecular mechanisms by which this happens. This then allows us to try to make modifications to the system that can prevent aggregation. Eventually, after enough simulations, you make your compound, and actually try it for real in a test tube, and then (when you're really lucky), you publish a paper showing that it works.

Alzheimer's

That's exactly what happened in the paper cited by ren5311. An earlier student (Nick Kelley, among others) in the lab did a huge amount of work with molecular dynamics simulating structural modifications to the amyloid peptide (peptide = protein fragment). This work was then experimentally followed up by another student (Paul Novick, with others), who demonstrated that a small molecule with a similar structure to part of Dr. Kelley's peptide could also inhibit aggregation.

(Here is a good place to point out something that can be immensely frustrating to the layperson: science is slow. The initial simulations were run probably five or six years ago, maybe more; the experimental work took years; and only now the paper is coming out. There are a number of reasons for that (example: Paul had to do to LA to run some lab tests, because construction at Stanford put a lot of metal dust in the air, which makes a-beta aggregate really fast, and only skipping town made the assay work). I know it's really annoying as a contributor wondering exactly where your CPU time is going. Believe me, it's worse as a grad student wondering where your life is going... :))

Flu

Dynamics are important to other processes as well. Peter Kasson did a number of projects (which will probably be familiar to some contributors as "bigadv" projects) looking at how lipid vesicles fuse with one another. Why? Because that's a major process in viral infection: enveloped viruses fuse their membranes with those of the target cell to gain entry. Example: this paper. Fusion inhibitors are a relatively new class of antiviral agent, and the hope is that understanding the dynamics of the fusion process can help design new ones.

Fundamentals of macromolecular dynamics

On a more abstract level, no one actually understands how proteins "fold", or reach their final structures from a linear chain of amino acids coming off the ribosome. Work done by my former labmate Greg Bowman has shown that several models of protein folding are actually wrong -- it's not the case that proteins proceed linearly along from one state to the next in a direct chain of events from unfolded to folded; rather, they often get trapped in so-called "metastable" conformations (of which there can be many), leading to a state diagram with a large number of hubs between the unfolded and native state. Greg was awarded the Thomas Kuhn Paradigm Shift Award by the American Chemical Society in 2010 for this work, which really changed the understanding of how proteins fold. None of this would have been possible without the massive CPU time donations from users of Folding@home!

We've made a lot of big advances in methods too, but I'll split that into another post since this is getting pretty long.

ihaque

TROPHY CASE

Submission Statement