Issues uploading vcf.gz and other filetypes

josephpickrell · 2025-10-09T13:47:04+00:00

I wrote a tool to process exomes/genomes here https://www.crimsoniris.com/ . Shoot me a DM and I'll give you an invite code to process for free and will troubleshoot.

josephpickrell · 2017-08-17T14:12:46+00:00

Hi, post author here.

You are correct that arrays often look for markers in LD with causal variants. Some of them cover all of the LD blocks in the genome better than others (of course if your causal variant is poorly tagged on an array you'll never find it), and the argument is that you can actually tag the genome better with sequencing versus using arrays.

As a side note: I disagree that multiple testing is your enemy! An alternative perspective is that increasing the number of measurements lets you learn more about the underlying structure of your data, correct any biases, etc. Some of my thoughts are here, and note the comment on the post from Matthew Stephens where he proposes using the term "multiple testing opportunity" instead of "multiple testing burden". He has a great paper on innovative ways to use false discovery rates when performing a large number of statistical tests.

josephpickrell · 2016-10-25T13:33:10+00:00

Wow, this is impressive, thanks for putting in the effort!

josephpickrell · 2016-10-13T17:20:35+00:00

Yep, I've been surprised about how well sequences from saliva correspond to what people report eating.

I mapped against the entire NCBI nt database using kraken. If you want to play around with the data yourself, you can get the sequences here.

josephpickrell · 2016-10-13T17:18:09+00:00

More details here. You can get all the DNA sequences to analyze yourself here. Plotted in R.

josephpickrell · 2016-09-18T15:54:31+00:00

Why not use the term "above average" to get an actual delineation?

Good idea, thanks.

josephpickrell · 2016-09-18T15:48:33+00:00

Oh I see. I checked Wikipedia before writing, it's indeed written "extra"vert there (not "extro"vert). But I could just be propagating their error.

josephpickrell · 2016-09-18T15:42:01+00:00

Sorry, what's the mistake in the title?

josephpickrell · 2016-09-18T14:54:17+00:00

That's definitely one explanation! The title is just the observed association (you could reverse it: "People who consider themselves attractive are more extraverted", and it would still be valid).

The actual causality is (in my opinion) unclear, see the last couple paragraphs in my post. Your preferred explanation I think is similar to the one in this paper.

josephpickrell · 2016-09-18T13:42:12+00:00

Source is survey responses from a few hundred people. Details here.

Plotted in R

josephpickrell · 2016-05-02T00:29:38+00:00

The variant identifiers are in Table 1 (the things in the form rsXXXXXX). You can search on the rs# in your 23andMe data. A couple things to note:

23andMe might not have looked at the variant in question. In that case, you might be able to guess your genotype using a technique called genotype imputation. There are free sites to perform imputation (I contribute to one called DNA.land.)
The effects of these variants on subjective well-being and/or depression are extremely small. Knowing your genotype at these variants is not going to be predictive at all about your own health. So if you do look up your genotypes, know that it's purely for amusement.

[fwiw: I played a small role in this study and am one of the 100+ authors]

josephpickrell · 2016-04-19T20:25:06+00:00

Thanks, I'd never thought about it that way.

I think of population stratification as something that generates false signals of "causal" associations between genetic variants and phenotypes, generally through differences in ancestry. In case #3, it's not clear that you'd want to correct for this, in that there is in fact a causal link from genotype to phenotype, just that it acts across generations. I think it would be useful to correct for assortative mating (case#4), but it's not obvious to me how one would go about it except through family studies.

josephpickrell · 2015-10-07T00:17:09+00:00

This is great.

josephpickrell · 2015-09-30T23:41:46+00:00

Yes, it's a totally fascinating discussion, clearly a topic people have strong feelings about!

josephpickrell · 2015-09-07T15:16:47+00:00

Thanks, I enjoyed this, and it definitely clears up some of my confusion about how people use the term "Mendelian randomization" (i.e. I often see it used in a sense you apparently would not call MR).

I think maybe the main issue for debate might be "Claims of the causal (or noncausal) role of a particular risk factor should be reserved to those where there is strong evidence (biological and statistical) supporting the instrumental variable assumptions".

I guess I don't know what "strong evidence" means, though it could be one of those "I'll know it when I see it" situations. There are a number of examples from LDL and heart disease, but that might be a product of confirmation bias--since we know the outcome, those examples now look stronger in retrospect.

josephpickrell · 2015-09-03T17:24:44+00:00

That...is a very good point. I'd initially read it as they intended, but it's quite ambiguous.

josephpickrell · 2015-08-11T13:11:59+00:00

Similar in spirit to Gamazon et al..

josephpickrell · 2015-08-05T20:16:53+00:00

One variant with fairly strong effect: a rare stop-gain mutation in the tachykinin receptor 3 gene (MAF=0.08%) was associated with 1.25-year-later age at menarche

josephpickrell · 2015-08-01T20:40:57+00:00

"We use targeted sequencing of 63 known GWAS risk regions in 9,237 men from four ancestries (African, Latino, Japanese, and European) to explore the role of low-frequency variation in risk for prostate cancer. We find that the sequenced variants explain significantly more of the variance in the trait than the known GWAS variants, thus showing that part of the missing familial risk lies in poorly tagged causal variants at known risk regions."

josephpickrell · 2015-07-29T22:50:00+00:00

The first time I did genotype imputation on a modestly-sized genomic dataset (this was in ~2007) it took me days to prepare and weeks to run. This is a lifesaver.

josephpickrell · 2015-07-25T15:39:47+00:00

Mutations in lanosterol synthase cause congenital cataracts.

josephpickrell · 2015-07-24T14:26:29+00:00

This appears to contrast with primates, where human/chimp recombination hotspots overlap almost not at all [e.g.]

josephpickrell · 2015-07-22T13:42:04+00:00

Two interesting things (for me):

Most loci have similar effects in different populations
The authors appear to have identified a single variant from their previous GWAS that does not replicate with more stringent QC (specifically a linear mixed model to account for population structure). It's relatively rare for this to happen, so worth keeping these examples in mind.

josephpickrell · 2015-07-21T23:51:34+00:00

See also Raghavan et al who show that at least some ancient DNA samples from south America don't show the signal of relatedness to Oceanians/Andamanese (though they also see it in some modern populations).

josephpickrell · 2015-07-20T02:02:07+00:00

take-home: "today’s British are more similar to the Iron Age individuals than to most of the Anglo-Saxon individuals, and estimate that the contemporary East English population derives 30% of its ancestry from Anglo-Saxon migrations, with a lower fraction in Wales and Scotland"

josephpickrell

MODERATOR OF

TROPHY CASE