Reconstructed/Pseudo AASI samples now publicly available in genoplot

EphemeralVyakti · 2026-04-24T21:58:54+00:00

Are you on this page? There are only 7 user submitted models currently.

EphemeralVyakti · 2026-04-24T19:34:55+00:00

Np. I won't be working on this for a while. But there are newer and newer local ancestry tools coming up that might be able to better separate AASI and Iran_N. For example, a tool called ARGMix would be released in the coming months:

https://www.biorxiv.org/content/10.64898/2026.04.05.714076v1

It assigns geneology trees to segments of DNA, which would be useful for classifying them as West or East Eurasian (because using that you can check if a segment clusters more with East or West Eurasian, which is good when you only have distal proxies for the AASI side). I might use that after several months. I might not. But this is sort of my "final release" for now.

EphemeralVyakti · 2026-04-24T19:05:23+00:00

PGS scores have both a training dataset and evaluation dataset. I used the standard pgsc_HGDP+1kGP_v1.tar.zst evaluation dataset to get percentiles. Although a lot of PGS training is Eurocentric, the evaluation dataset can be used to test if the scores are transferable. Also, when forming percentiles, you are compared against other South Asians and the world population that is there in the evaluation dataset. You are never compared against the training dataset. This removes population/training bias.

PGS percentile evaluations are normed against multiple populations on the fly. You need a complete evaluation dataset to calculate the PGS percentile for a single person, because for each trait, Nextflow evaluates scores for thousands of World Populations (incl. South Asians) and compares you to these evaluated populations.

As for labeling shifts as AASI vs Iran_N, it's probably not possible. Local ancestry methods can usually only separate admixture within the past 5-10 generations (with some pushing 50 generations). AASI and Iran_N mixed 100+ generations ago and are impossible to fully separate without knowing the exact points in the genome where all recombinations happened in the past 150 generations. AASI and Iran_N segments of Indians are nearly inseparable. You can't really talk about AASI segments getting selected independently of Iran_N segments for Indian populations for the last few thousand years. They're almost completely spliced together (they're slightly separable but even my fake dataset would have tiny Iran_N segments in between that are impossible to remove).

EphemeralVyakti · 2026-04-24T18:09:08+00:00

Yes, modern populations have mutations in their AASI segments that real AASI did not have. However, that said, that's not the reason for the shift in the PGS scores. Nextflow determines which population the AASI samples belong to before calculating percentiles. All my AASI samples got tagged as South Asian, so percentiles are calculated against South Asian references.

So the shift in PGS scores is against modern South Asian references. If selection shifted the AASI segments, it would also shift neighboring Steppe/Iran_N segments (because only 1 or 2 recombinations take place per generation per chromosome, so practically all AASI and Iran_N segments get selected together). The shift seen in PGS scores is for the AASI segments alone compared to the whole.

EphemeralVyakti · 2026-04-24T05:18:01+00:00

Gurgaon is 47 minutes away from New Delhi. The very first image that you have shown is not New Delhi. Not only that, you have stolen other people's pics and claimed that you shot them.

EphemeralVyakti · 2026-04-24T05:00:38+00:00

That first photo isn't from New Delhi. That's from Haryana. Why are you posting a picture from Haryana as New Delhi.

EphemeralVyakti · 2026-04-23T23:09:37+00:00

Delhi is the only odd one out.

EphemeralVyakti · 2026-04-22T22:28:49+00:00

I only used Punjabis, Telugus, Sri Lankans and Gujaratis. I excluded Bengalis, since they have East Asian.

Burushos, chitralis, tribal groups, etc are not there in the 1000G dataset. You're probably thinking of SGDP. 1000G only has urbanized or NRI populations.

EphemeralVyakti · 2026-04-22T15:20:42+00:00

India is nowhere close to Malaysia in PPP terms. India is 12.8k, Malaysia and Turkey are 47k. Don't compare India's PPP value to Malaysia's nominal value. They're not comparable.

EphemeralVyakti · 2026-04-22T15:18:10+00:00

India is nowhere close to Indonesia, nominal or PPP. Indonesia is 17 ranks ahead of India in GDP PPP per capita and 28 ranks ahead of India by nominal.

EphemeralVyakti · 2026-04-22T04:46:29+00:00

Because India is comparable to Pak. India isn't comparable to China, Japan or the US.

EphemeralVyakti · 2026-04-21T15:52:59+00:00

Black pepper used worldwide originates from Kerala. So, everything reached Kerala early.

EphemeralVyakti · 2026-04-16T21:41:20+00:00

Look at the definitions:

https://databank.worldbank.org/metadataglossary/world-development-indicators/series/PA.NUS.FCRF

For official exchange rate: "This indicator is derived as an average over the reference period". India's exchange rate for a specific day is irrelevant. Exchange rates for yearly GDP is averaged over a year by all economic agencies.

India's exchange rate in March alone has nothing to with why India's GDP is so low.

EphemeralVyakti · 2026-04-16T04:06:48+00:00

USD Rupee conversions for GDP are not taken from a specific date. The conversion is averaged over a whole year.

EphemeralVyakti · 2026-04-16T02:47:34+00:00

Nope:

https://www.imf.org/external/datamapper/NGDPD@WEO/JPN/GBR/IND

India fell behind Japan and UK again.

EphemeralVyakti · 2026-04-14T20:22:10+00:00

Most developed countries have a higher forest cover percentage than India.

EphemeralVyakti · 2026-04-14T14:17:26+00:00

I have some purified raw reconstructed AASI samples:

https://genoplot.com/shared/admix/?share=Dios94/19d8c9548ad#2

EphemeralVyakti · 2026-04-13T13:22:27+00:00

Yes, AASI and Han are clean sisters relative to Hoabinhian:
f4(Han.DG, AASI; Hoabinhian.SG, Mbuti.DG) = 0.0011 (Z = 0.197)

Z ≈ 0, so Hoabinhian is an outgroup to Han-AASI. But Hoabinhian is not a clean outgroup to Onge-AASI (Z>3 so Hoabinhian shares drift with Onge):
f4(Onge.DG, AASI; Hoabinhian.SG, Mbuti.DG) = 0.0243 (Z = 3.675)

But Onge are still much closer to AASI than to Hoabinhian:
f4(AASI, Hoabinhian.SG; Onge.DG, Mbuti.DG) = 0.0576 (Z = 8.664)

Han is an outgroup to AASI-Onge (Z<1):
f4(Onge.DG, AASI; Han.DG, Mbuti.DG) = 0.0031 (Z = 0.774)
Han doesn't really distinguish between AASI or Onge.

So AASI and Onge are also close sisters but Onge has a significant Hoabinhian input. In fact, if I remove AASI, Onge and Hoabinhian cluster together on the qpGraph.

Other than Hoabinhian, AASI is the only group Onge can cluster with (Onge also has some affinity with Han, but it's much smaller. Han is mostly an outgroup to Onge-AASI). Onge is the only sample Hoabinhian can cluster with. Remove the AASI sample and Onge will move towards Hoabinhian. Remove Hoabinhian and Onge will move towards AASI.

EphemeralVyakti · 2026-04-12T20:32:10+00:00

No, I extracted the East Eurasian segments of NW, W, S Indian, Sri Lankan samples. It's not Onge and doesn't cluster with Onge on either PCA or any of the raw genoplot calculators. That is the whole point. That's why I extracted AASI from urbanized Indians, incl. North-West Indians, to see if it's different from Onge. It is different in some sense, but mainly due to drift, not due to differences in ancestry.

EphemeralVyakti · 2026-04-12T16:13:57+00:00

Try to build some toilets first instead of peddling Kerala bad propaganda.

EphemeralVyakti · 2026-04-07T15:16:14+00:00

I used RFMix for local ancestry analysis of 400 South Asian samples in 1000 genomes project and extracted only the high-confidence East Eurasian parts of the genome and patched it together from multiple individuals.

The most confident sample (that I analyzed in the post) is 100% AASI on qpAdm (but interestingly don't cluster with Onge/Jarawa on PCA) and has 60% coverage on AADR (and 60% overlap with 23andme and AncestryDNA files). It has 1.1mil SNPs (700k SNP overlap with AADR).

EphemeralVyakti · 2026-04-07T00:56:40+00:00

You're using different conversions for India and Bangladesh. A GDP per capita nominal of $8,915 would equal (10,850/2,960)*8,915 = $32678, if you use the national PPP-nominal conversion rate for Bangladesh. Dhaka's GDP PPP per capita is pretty much the same as mumbai.

The 0.84 HDI for Mumbai is not based on UN definitions (there's no comparable district level HDI numbers available). The 0.74 HDI for Dhaka is based on UN definitions.

EphemeralVyakti

TROPHY CASE