0
Thumbs Up |
Received: 7,243 Given: 2,623 |
Thumbs Up |
Received: 465 Given: 231 |
I didn't know Udmurts had such high Mongoloid ancestry considering the predominance of red hair in them
Enviado desde mi SM-A107M mediante Tapatalk
"5’5" niggas with low cheekbones and bug eyes be like I am Nordic. No, but I’ve been giving your wife my dic you alpine motherfucker"
-Some albanian guy on twitter
Thumbs Up |
Received: 1,249 Given: 524 |
Here is the supp
https://www.biorxiv.org/content/bior...?download=true
Here is the paper
https://www.biorxiv.org/content/10.1...555v1.full.pdf
Thumbs Up |
Received: 1,249 Given: 524 |
Here's additional proof something is not right with the G25. Everyone should know that Eurasians such as Kurds should be closest to other Eurasians and not Africans.
G25 also wrongly shows Kurds closer to Yorubans and Esans than to Papuans which is absurd. Additionally, G25 wrongly shows Kurds closer to Sudanese than to Karitiana and Surui.
Additionally G25 wrongly shows Kurds are closer to Jordanians than Kurds to E. Europeans and Uyghur. I can go on and on with the wrong ranking in G25.
NO Kurdish G25 Distance to: 1 Turkish_Kayseri 0.04594 2 Armenian_B 0.04996 3 Abkhasian 0.07100 4 Adygei 0.07185 5 Chechen 0.07279 6 Jordanian 0.09159 7 Balochi 0.12169 8 Albanian 0.12363 9 Brahui 0.12457 10 Bulgarian 0.13177 11 French_Al 0.16473 12 BedouinB 0.16728 13 Hungarian 0.16929 14 Czech 0.18128 15 Basque_French 0.19215 16 Finnish 0.21537 17 Mozabite 0.23311 18 Saharawi 0.26496 19 Uygur 0.28771 20 Hazara 0.28992 21 Kirghiz 0.39622 22 Jarawa 0.42858 23 Somali 0.43369 24 Mongolian 0.46764 25 Mongola 0.55815 26 Eskimo_Sireniki 0.56139 27 Japanese 0.58489 28 Sudanese 0.69730 29 Karitiana 0.71006 30 Surui 0.71489 31 Yoruba 0.74242 32 Esan_Nigeria 0.74434 33 Papuan 0.78951 34 Khomani_San 0.83812 35 Ju_hoan_North 0.90933 36 Mbuti 0.92566
Thumbs Up |
Received: 1,249 Given: 524 |
Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc
NO POPULATION DST 1 Lezgin 0.85119 2 Armenian 0.85040 3 Adygei 0.85039 4 Abkhasian 0.85027 5 Turkish-Kayseri 0.85012 6 Chechen 0.84983 7 Czech 0.84973 8 Hungarian 0.84956 9 Bulgarian 0.84940 10 French 0.84880 11 Basque 0.84860 12 Finnish 0.84860 13 Russian 0.84855 14 Estonian 0.84832 15 Sardinian 0.84817 16 Polish 0.84797 17 Pathan 0.84782 18 Tajik 0.84777 19 Kalash 0.84722 20 Sindhi 0.84702 21 Jew_Yemenite 0.84700 22 Tlingit 0.84695 23 Balochi 0.84675 24 Brahui 0.84615 25 Brahmin 0.84608 26 Samaritan 0.84603 27 BedouinB 0.84589 28 Saami 0.84589 29 Uyghur 0.84578 30 Makrani 0.84567 31 Mansi 0.84565 32 Bengali 0.84557 33 Punjabi 0.84517 34 Hazara 0.84498 35 Kyrgyz_Kyrgyzstan 0.84454 36 Jordanian 0.84422 37 Mala 0.84288 38 Tubalar 0.84250 39 Irula 0.84181 40 Even 0.84074 41 Mongola 0.84070 42 Tu 0.84029 43 Hezhen 0.84020 44 Mixtec 0.84018 45 Yakut 0.84000 46 Burmese 0.83998 47 Mexico_Zapotec.DG 0.83971 48 Xibo 0.83970 49 Naxi 0.83951 50 Han 0.83945 51 Korean 0.83923 52 Japanese 0.83898 53 Mayan 0.83886 54 Khonda_Dora 0.83884 55 Daur 0.83884 56 Tujia 0.83882 57 Quechua 0.83881 58 Eskimo_Sireniki.DG 0.83873 59 Oroqen 0.83861 60 Ulchi 0.83859 61 Eskimo_Naukan.DG 0.83855 62 She 0.83853 63 Miao 0.83845 64 Yi 0.83844 65 Itelmen 0.83824 66 Mixe 0.83819 67 Kinh 0.83813 68 China_Lahu 0.83783 69 Pima 0.83775 70 Thai 0.83774 71 Eskimo_Chaplin.DG 0.83767 72 Cambodian 0.83766 73 YANA_UP_WGS 0.83735 74 Dai 0.83730 75 Kusunda 0.83724 76 Piapoco 0.83703 77 Ami.DG 0.83696 78 Karitiana 0.83687 79 Surui 0.83654 80 Igorot 0.83649 81 Dusun 0.83639 82 Saharawi 0.83398 83 Mozabite 0.83287 84 Bougainville 0.83084 85 Papuan 0.82871 86 Somali 0.81444 87 Masai 0.80654 88 BantuKenya 0.79064 89 Luo 0.79045 90 Gambian 0.78966 91 Luhya 0.78919 92 Mandenka 0.78855 93 Esan 0.78710 94 Mende 0.78708 95 Yoruba 0.78690 96 Biaka 0.78118 97 Mbuti 0.77853 98 Ju_hoan_North 0.77354 99 Khomani_San 0.77330
Thumbs Up |
Received: 7,243 Given: 2,623 |
Thumbs Up |
Received: 1,249 Given: 524 |
One way to re-word what you just said is one to one gene to gene comparison using IBS is more accurate method than G25 or Admixture calculator in determining genetic similarity between 2 pops say Kurds and Bulgarians or Mongolians.
I'm reminded of something Dilawer told me a while back. He said Admixture or PCA based methods don't accurately portray genetic similarity between 2 populations like one to one IBS comparison. They just cluster based on geography and not based on genes. That's partly the reason why individuals in a population have all sorts of phenotypes but Admixture or PCA still clusters them together.
Although PCA or Admixture clusters Kurds or Poles within clusters, if one does IBS on individual Poles or Kurds then they may show widely differing results with regards to genetic similarity with Siberians or E. Asians depending on which components the calculator uses or what samples the G25 PCA used. By contrast, IBS results are not depending on this stuff and have no relevance to what samples are used.
This may in fact be more closely aligned with their phenotypes than G25 or Admixture results which would cluster the Poles or Kurds within clusters and these clusters would not explain their individualistic phenotypes like IBS would explain.
Thumbs Up |
Received: 4,863 Given: 2,946 |
It's from this post by Razib Khan: https://www.gnxp.com/WordPress/2018/...n-one-command/.
Khvorykh et al. 2020 even did admixture-style analysis based on the number of shared IBD segments: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/:
The fourth stage of our computations is unique to this research and was absent in Fedorova et al. 2016. In this stage, we created Supplementary Table S4 using the program rankingATLAS2_v9.pl, and the data from the Supplementary Table S1 ("IBD Normalized Numbers"). Supplementary Table S4 presents the percentages of relative relatedness of each population to the nine Distinct Human Genetic Regions (DHGRs) (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE, see Results section). For each population (e.g., Georgia) the program counts the numbers of shared IBD fragments per pair of individuals for this population with the three representatives of DHGR region and then makes a sum of these three numbers. For example, the for the AFE region, the summing number of shared IBDs will be the following: 0.48 IBDs (per pair for Georgia vs. LWK) + 0.92 (Georgia vs. Din_AFR) + 3.12 (Georgia vs. Mas_AFR) = 4.52 (for the AFE group). And so on for each DHGR group. In order to minimize the Founder effect in our calculations, we created an upper threshold of 100 shared IBD segments for any populational pair. For example, in a calculation of Congo (Con_AFR) vs. LWK, the original value was 151.9, however, with the threshold in place, the program changed the value to 100). Finally, we calculated the relative percentages for all 9 components (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE) in a way that ensured their sum was always 100%. Ranking data for each population (as presented in Table 2) were also obtained by rankingATLAS2_v9.pl.
Here's a graph I made of some populations from Khvorykh's table S4:
The proportion of the Northern European component was defined based on the number of shared IBD segments with Estonians, Germans, and Swedes. So for example Swedes have a higher proportion of the Northern European component than Latvians.Code:curl -Ls pastebin.com/raw/BmNdqWvi|tr -d \\r>/tmp/tables4 printf %s\\n Sau_MDE Ira_MDE Rom_EUR Gre_EUR Ger_EUR GBR_EUR Swe_EUR Lat_EUR Rus_EUR Est_EUR Fin_EUR FIN_EUR Ing_EUR Kar_EUR Vep_EUR Saa_EUR Mor_EUR Kom_EUR Udm_EUR Mar_EUR Mis_EUR Kry_EUR Tat_EUR Chu_EUR BSh_EUR Man_SIB Kha_SIB Tun_SIB For_SIB Nen_SIB Nga_SIB Bur_SIB Yak_SIB Ale_ARC>/tmp/pop awk -F, 'NR==1{print;next}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1]}' /tmp/tables4 /tmp/pop|awk -F, -v OFS=, '{print$2,$6,$11,$10,$7,$8,$5,$9,$3,$4}'>/tmp/a R -e 'library("ggplot2") library("reshape2"); t=read.csv("/tmp/a",header=T,check.names=F) t2=melt(t,id.var="Population") lab=round(t2$value) lab[lab<=2]="" t2$lab=lab t2$value=t2$value/100 ggplot(t2,aes(x=fct_rev(factor(Population,level=unique(Population))),y=value,fill=variable))+ geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+ geom_text(aes(label=lab),position=position_stack(vjust=.5,reverse=T),size=2.5)+ coord_flip()+ theme( axis.text=element_text(color="black"), axis.text.x=element_blank(), axis.ticks=element_blank(), axis.title.x=element_blank(), legend.margin=margin(0), legend.title=element_blank(), panel.background=element_rect(fill="white"), )+ xlab("")+ scale_x_discrete(expand=c(0,0))+ scale_y_discrete(expand=c(0,0))+ ggsave("/tmp/a.png",width=6,height=7)'
Thumbs Up |
Received: 4,863 Given: 2,946 |
BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread....ean-bias/page2):
G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).
However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.
It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.
For example here I generated a 12 by 12 matrix of FST distances:
Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:Code:R -e 'library(admixtools); f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])}; fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG")); write.csv(round(f2m(fst),6),"fst",quote=F)' $ cat fst ,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khomani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG,Turkey_N.DG,Yoruba.DG Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.208092,0.055175,0.200832,0.264921,0.19757,0.037891 Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.027304,0.243293,0.020451,0.188681,0.138516,0.189624 Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0.102589,0.211397,0.089601,0.188651,0.03734,0.156253 Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034955,0.255676,0.102751,0.247671,0.311007,0.244202,0.108353 Khomani_San.DG,0.093686,0.269936,0.236001,0.034955,0,0.264307,0.110281,0.256679,0.319966,0.253402,0.115599 Korean.DG,0.208092,0.027304,0.102589,0.255676,0.264307,0,0.238141,0.001142,0.178226,0.136865,0.184756 Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110281,0.238141,0,0.230583,0.294664,0.228177,0.077978 Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.256679,0.001142,0.230583,0,0.171326,0.130389,0.176566 Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.319966,0.178226,0.294664,0.171326,0,0.215617,0.241977 Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.253402,0.136865,0.228177,0.130389,0.215617,0,0.172992 Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.115599,0.184756,0.077978,0.176566,0.241977,0.172992,0
Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:Code:$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)' [,1] [,2] [,3] Biaka.DG 0.09458067 -0.009318035 0.0007634203 Even.DG -0.10587237 0.033672133 -0.0493091783 Finnish.DG -0.06971126 0.039180919 0.0443036464 Ju_hoan_North.DG 0.14384037 -0.005407783 -0.0079752958 Khomani_San.DG 0.15305612 -0.005072182 -0.0095401289 Korean.DG -0.10263674 0.022172427 -0.0479094108 Mbuti.DG 0.12082958 -0.006742200 -0.0017669591 Mongola.DG -0.09712661 0.017649424 -0.0402117613 Papuan.DG -0.13332805 -0.137725617 0.0231446908 Turkey_N.DG -0.07026603 0.060365299 0.0804792633 Yoruba.DG 0.06663432 -0.008774385 0.0080217135
With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.Code:$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3);sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2)) [1] 0.2110375
Thumbs Up |
Received: 1,249 Given: 524 |
Very good. You're thinking out of the box!. Yes of course you can make a calculator based on FST or IBS. You can do IBS between target and WHG, ENF, ANS, etc and even square the individual results to create bigger differences between target and assign each a prorated proportion of 100%.
At least it wouldn't have the biases and variability of results like G25 or Admixture where the results depend on the other samples in the runs.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks