1


Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
Thank you for posting. I was waiting for her reply. This kit has to be mixed, looking at her East Asian/Siberian level which is around 37% which exceeds even the generic Thakuri types (30-34%). The Khadka Chhetri kit turned out to be half Indo-Burmese-Newar mix at this 37% East Asian proportions.
She replied pure Jharra ancestry but I think she has some non-Chhetri ancestry in recent generations which she seems unaware. Most Chhetri kits are scoring around 23-26% total East Asian range be it a Jharra or a Khatri, some variation would go slightly under 30% but this one's approaching 40%, so mathematically an outlier. What do you think Thambi?
https://docs.google.com/spreadsheets...soQ/edit#gid=0
This Karki kit seems typical kit as he matches around East Asian level of Basnet, Nepal Chhetri (confirmed) and Budhathoki. My primary basis are these 4 kits.
# Population Percent
1 S-Indian 35.37
2 Baloch 27.56
3 NE-Asian 18.51
4 NE-Euro 7.3
5 Caucasian 5.29
6 Siberian 3.08
7 American 1.53
8 SE-Asian 0.88
9 Beringian 0.24
10 San 0.23
Single Population Sharing:
# Population (source) Distance
1 nepalese-c (xing) 6.62
2 nepalese-a (xing) 17.29
3 bengali (harappa) 17.86
4 up-muslim (harappa) 18.01
5 bengali-brahmin (harappa) 18.02
6 bihari-muslim (harappa) 18.2
7 kashmiri (harappa) 19.79
8 gujarati-muslim (harappa) 20.21
9 up-brahmin (harappa) 20.32
10 punjabi (harappa) 20.32
Last edited by Kaazi; 10-30-2021 at 02:00 AM.
Thumbs Up/Down |
Received: 4,862/123 Given: 2,945/0 |
Kalashes are ultra-drifted, so they're far from everyone based on IBS and f2 and FST. Based on IBS, Kalashes are even further from French than Turkmens are, but that doesn't mean that they're more Mongoloid than Turkmens:
IBS gives me weird results sometimes, like how here the French are closer to Finns than to Romanians, and Mongols are closer to Negidals than to Buryats:Code:$ x=eurasia;plink --bfile $x --genome --out $x $ awk 'NR==FNR{a[$1]=$2;next}FNR>1{i=a[$2]" "a[$4];b[i]+=$12;n[i]++}END{for(i in b)print b[i]/n[i],i}' $x.{pick,genome}|grep French|sort -n|grep -C5 Kalash 0.802085 French Jew_Cochin 0.802236 Burusho French 0.802316 Saudi French 0.802367 GujaratiA French 0.802372 French BedouinA 0.802463 Kalash French 0.802492 French Turkmen 0.802494 Bashkir French 0.802533 Brahui French 0.802551 French BedouinB 0.802577 Syrian French
I think f2 is a better way to calculate the distance between populations:
But in order to estimate the amount of Mongoloid ancestry based on an IBS matrix, one way is to model populations in the matrix with Vahaduo. Or you can also use my two-way function which uses the same method as Vahaduo, so it finds the point between the two source populations which has the lowest Euclidean distance to the target population:
Code:$ x=eurasia;plink --bfile $x --distance ibs square --out $x $ Rscript -e 'f="'$x'";t=read.table(paste0(f,".mibs"));fam=read.table(paste0(f,".fam"))[,2];pop=read.table(paste0(f,".pick"),row.names=1)[fam,];tav=function(x,y)data.frame(aggregate(x,list(y),mean),row.names=1);a=tav(t(tav(t,pop)),pop);write.csv(round(a,6),paste0(f,".mibsa"),quote=F)' $ way2()(awk -F, 'NR<=2{for(i=2;i<=NF;i++)a[NR,i]=$i;next}$1{min=-1;for(r=0;r<=100;r+=1){s=0;for(i=2;i<=NF;i++)s+=($i-((1-r/100)*a[1,i]+(r/100)*a[2,i]))^2;s^=.5;if(min==-1||min>s){min=s;minr=r}}printf"%.3f %s %s\n",min,minr,$1}' "$@") $ printf %s\\n French Han Finnish Kurd Tharu Kalash Turkmen Lithuanian Sherpa Rai Tamang Gurung Newar|awk -F, 'NR==FNR{a[$1]=$0;next}{print a[$0]}' eurasia.mibsa -|way2|sort -nk2 0.030 1 Lithuanian 0.103 1 Kurd 0.042 8 Finnish 0.044 18 Kalash 0.060 24 Udmurt 0.048 34 Turkmen 0.066 50 Mansi 0.041 64 Newar 0.050 66 Tharu 0.054 83 Mongol 0.042 84 Tamang 0.081 90 Nganasan 0.047 91 Gurung 0.051 94 Rai 0.058 95 Sherpa
Last edited by Komintasavalta; 10-30-2021 at 02:16 PM.
Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
Great work man. But Rai/Sherpa aren't 94/95% East Eurasian, they're close to the Tibet IA and the Tibetan IA (Nepal Chokhopani 2700BP) was 15% para-Onge/ South Eurasian (AASI) as per qpgraphs.
Spoiler!
Tibet IA is around 12% South Eurasian in G25. So, they can't be 95% pure East Eurasian in anyway.
Target: NPL_Chokhopani_2700BP
Distance: 3.3505% / 0.03350524
82.8 CHN_Yellow_River_MN
10.6 S_AASI
4.8 RUS_Baikal_N
1.6 IRN_Ganj_Dareh_N
0.2 Dinka
Kalash showing 20% is certainly showing all their 10-12% South Eurasian/ AASI which shouldn't be included in East Eurasian imo.
Target: Kalash
Distance: 2.6913% / 0.02691264
43.0 IRN_Ganj_Dareh_N
15.2 RUS_Samara_HG
11.4 AASI_NW
9.6 Anatolia_Barcin_N
8.4 RUS_Tyumen_HG
6.8 GEO_CHG
3.6 TUR_Tepecik_Ciftlik_N
2.0 CHN_Yellow_River_MN
I agree Basque and Sardinian would be proper West Eurasian proxies than Balto-Slavs, and also some deeply ancient pops from Levant/Anatolia would be better, imo.
Seems like South Eurasian is absorbed into East Eurasian percentages in these calculations. Separating it seems difficult and sadly, we don't have any proper South Eurasian proxies; the most South Eurasian pops (Paniya/Pulliyar) are around estimated 70 percent.
Last edited by Kaazi; 10-30-2021 at 04:34 AM.
Thumbs Up/Down |
Received: 7,434/51 Given: 11,086/8 |
yeah true, does seem like an outlier to an extent. not just the NE asian, but highest SE asian as well among the pure chhetri kits you have along with good amount of siberian. Does seem to have some recent TB mix tbh. where is she from in nepal did she reply regarding that? I think the eastern chhetris are more mixed with TB groups recently than the western ones but i could be wrong.
Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
No, I don't think there's an East - West dynamics for Chhetris. It just depends on the lineage of person. Those who marry among Jharra and Khatris are the standard Chhetris while some have Gharti and Matwali Chhetri ancestry and those mixing with Thakuris do have more irregular non-Khas ancestry and probable higher East Asian percent. I'm not basing these things on phenotypes but on the oral tradition and info.
It's just subcaste dynamics imo.
Jharra/Khatri -
Jharra are descendants of Kshatriya (Khas peasant-warrior-administrative-aristocrat-gentry) bulk of Khas kingdom that were Hinduized early around 12-14th century while Khatris being descendant of also not-so-different Khas Brahmin (Bahun) lineage merging into Chhetri caste around 14th-19th century. Jharra outnumber Khatris and their admission to Chhetri caste is at least 5 generations old, so estimated Brahmin ancestry is less than 5 percents.
Jharra Chhetris and Pahadi Rajputs of Uttarakhand/East Himachal already gained 25-30 percent East Asian in Central Himalayas during their ethnogenesis and are brotherly tribe.
vs
Gharti & Matwali Chhetris/Thakuri -
Gharti and Matwalis have a dubious origin and freely used to marry among non Khas folks (like Magars) but later received "holy thread (Janai) and Chhetri caste" from Jung Bahadur Rana in 1850s. Thakuris are 3 way mix of Khas (Indic tribe), Jad/Jaad/Jar/Jariya (proper Tibetan tribe) and Magar (Himalayan Tibetic tribe).
Edit: There could be only one geographical dynamics i.e. lower river basin valley Chhetris and high altitude Chhetris where those in lower river basins live with the Bahuns who are in huge numbers and outnumber Chhetris in most places while those in high altitude hills have exposure to Sino-Tibetan tribes. I've seen that these lower river basin valley ones are very ethno-religious than those in high altitude.
Last edited by Kaazi; 10-30-2021 at 05:50 AM.
Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
@Thambi though there's vast irregularities in gedmatch Pahadi Rajputs, the 8 Rajput Rajasthan high East Asian outliers from Mondal et al 2016 (80 percent coverage) do tells their commonality and overlaps with Chhetris.
Yesterday, I separated the Rajput East Asian outliers (25-30 East Asian types only) and averaged them in Harappaworld and also in G25. They're quite close to Chhetris and the average East Asian percentages matches exactly lol.
Target: Rajput_Rajasthan_o
Distance: 1.7429% / 0.01742949
51.8 IRN_Shahr_I_Sokhta_BA3
29.4 NPL_Chokhopani_2700BP
14.6 RUS_Srubnaya_Alakul_MLBA
4.2 UZB_Bustan_BA
I partially rely on Nepalese C due to very poor (10 pc coverage).
Target: Nepalese_C
Distance: 2.6131% / 0.02613052
47.0 IRN_Shahr_I_Sokhta_BA3
29.4 NPL_Chokhopani_2700BP
16.6 RUS_Srubnaya_Alakul_MLBA
7.0 UZB_Bustan_BA
Code:Rajput-Rajasthan-outlier-average (8 kits) S-Indian: 34.61% Baloch: 26.14% Caucasian: 4.44% NE-Euro: 6.57% SE-Asian: 1.54% Siberian: 2.47% NE-Asian: 20.99% Papuan: 0.66% American: 0.45% Beringian: 1.35% Mediterranean: 0.58% SW-Asian: 0.14% San: 0.04% E-African: 0.04% Pygmy: 0.00% W-African: 0.00%Both are certainly like singular caste based on East Asian percentage and the West-South Eurasian variety doesn't necessarily indicate much differential ancestry. This was the standard Himalayan Kshatriya (29 East Asian) in the medieval West Nepal-Uttarakhand belt as all Bahun-Chhetris originated there. Later some gained extra East Asian through Magars and Jad Bhotiyas imo.Code:Nepali-Khas-Chhetri (8 kits) S-Indian: 32.71% Baloch: 26.11% Caucasian: 5.08% NE-Euro: 7.71% SE-Asian: 1.55% Siberian: 3.68% NE-Asian: 19.61% Papuan: 0.64% American: 1.05% Beringian: 0.71% Mediterranean: 0.65% SW-Asian: 0.42% San: 0.04% E-African: 0.00% Pygmy: 0.02% W-African: 0.02%
Removing the low East Asian samples bring the Nepalese C and Rajput outliers together and moves away the high IranN Bahun o samples.Code:Rajput_Rajasthan,0.0671555,-0.050776,-0.1095535,0.07832750000000001,-0.062011500000000004,0.0485265,0.0016449999999999998,0.003922999999999999,0.009407999999999998,-0.001002,-0.0064139999999999996,-0.002248,0.0033445000000000003,0.0008255000000000002,0.0036645,-0.0033810000000000003,-0.0056064999999999995,-0.0006335000000000001,-0.003205,-0.0082535,0.0006865000000000002,-0.0029054999999999997,0.0020954999999999997,0.0034945,0.0026345 Rajput_Rajasthan_o,0.0492285,-0.14318974999999998,-0.097815625,0.06371175000000001,-0.04677787499999999,0.04709750000000001,-0.00146875,0.012028375000000001,0.017998125000000004,0.011344375,-0.029575125,-0.002248,0.002304375,-0.004816875000000001,-0.001035125,-0.0028175,0.004156125,-0.00026925000000000007,-0.0011783750000000006,0.0025324999999999996,0.00140375,0.0057345,0.0025727500000000004,0.00516625,0.009011
Spoiler!
The low East Asian Rajputs are closer to Brahmins. Better and closer to truth as real plain Rajputs definitely would be no East Asian or at least very low and closer to Brahmins in Euclidean distances. I'm trying to change the Eurogenes dataset now with this 2 groups.
Spoiler!
Further I've simulated and found there are some overlaps between the two groups here.
https://anthrogenica.com/showthread....l=1#post806917
Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
Duplicate post. Delete this if possible.
Thumbs Up/Down |
Received: 407/2 Given: 443/1 |
Duplicate post. Delete this if possible.
Last edited by Kaazi; 10-30-2021 at 06:14 AM. Reason: Duplicate post. Delete this if possible.
Thumbs Up/Down |
Received: 1,250/11 Given: 524/7 |
Kalash drift only affects Admixture and PCA. The reason you’re having issues is because you’re not doing it right.
The guy who taught me most of what i know (dilawer) told me a long time ago that when you do IBS you have to be very strict with quality conntrol. He said to follow these things and you won’t have issues:
1- Datasets are very biased towards certain SNPs. Don’t mix 1000 Genomes with HDP with Simmons. If you have test samples you’re looking at run them separately with 1000G then Simmons then HDP and so on. So first do Simmons for example. Simmons usually has 2 or 3 samples per ethnicity. Let’s say French1 and French2. Sort table for French1 then French2 and then average. He sent me the Linux code he uses to automate . I’ll see if i can find it.
2- Make sure you do —geno 0.0001 so that you only use 100% overlapping SNPs only
3- You can then normalize your results
If you do the above you’ll end up with 600 or 800,000 overlapping SNPs after you throw out the 1000G samples with lots of missing SNPs (check using —missing to see which samples to throw before using —geno flag)
When you do these things you’ll have a nice list that makes sense maybe something like. I’m just making up numbers. These are not ancestry but genetic similarity numbers
700K SNPs
Han avg 0.82 100%
..
..
Dai avg 0.81 95%
..
..
Uyghur 0.80 78%
..
..
Kurds 0.78 45%
..
Kakash 0.779 43%
..
Adygei
..
..
Abkhasian
..
..
Estonian
..
..
..
Jordanians
..
..
..
Yoruban 0.72 0%
He also said each dataset is screwed up in one way or another. I remember he said 23andme SNPs have a huge bias against African admixtrure. With 23andme SNPs you get something like
Yoruban 100%
..
E. Asians
..
S. Asians
..
W. Asians
Which is screwed up
Thumbs Up/Down |
Received: 4,862/123 Given: 2,945/0 |
Yeah maybe the reason why in my IBS run the French were closer to Finns than to Romanians was that all of the Romanian samples had under 580,000 SNPs out of a maximum of 597,573. When I made another dataset with just Finnish, Romanian, and French samples, and I used `--geno 0` to remove every SNP that was missing from any sample, then the French were closer to Romanians than to Finns.
If you use `--geno 0` so there is no SNP with missing data, then does it matter if you combine samples from different sources? Even though I used `--geno 0`, I was now able to make a dataset with 387,406 SNPs and 568 samples, because I only used samples with at least 588,000 SNPs, and I only selected up to 4 samples from each population based on which samples had the highest SNP count. But even now it's weird how Forest Yukaghirs are so close to the French, and the French are still closer to Finns than to Bulgarians:
![]()
Last edited by Komintasavalta; 10-30-2021 at 02:25 PM.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks