View Full Version : Why is the amount of East Eurasian ancestry of Saamis and other Uralics underestimate by some here?
Zanzibar
03-09-2021, 06:05 AM
I have seen some users here underestimate or downplay the East Eurasian ancestry of some Finno-Ugrics such as Saami, saying that they have only minor 5-10% Mongoloid or maybe 15%, acting as if they are not that different from the average Euros, when that's not true at all.
When they along with VURers such as Udmurt, Mari have around 25-35% Mongoloid on average, some are even approaching 45-50% Mongoloid especially when counting groups like Khanty, Mansi and some Turkics like Bashkirs who are literally "balanced" Eurasians.
In Saamis in G25 are around 27% Mongoloid, Saami_Kola are close to 20% Mongoloid, while Mari are closer to 32% East Eurasian, Chuvash and Udmurts are both around 25% East Eurasian. The Mongoloid component is Krasnoyarsk_BA/kra001 which is an ancient Siberian population most closely related to the Nganasan, Yukaghir and Evenk. The amount of East Eurasian that Saamis, these VURers is the literal opposition version of Altaians, Kyrgyzs, Khakass, some Kazakhs who have around 25-32% West Eurasian.
Target: Saami
Distance: 2.8661% / 0.02866055
34.4 Baltic_EST_BA
26.4 RUS_Krasnoyarsk_BA
24.6 FIN_Levanluhta_IA_o
14.6 Yamnaya_KAZ_Mereke
Target: Saami_Kola
Distance: 1.8163% / 0.01816303
44.2 Baltic_EST_BA
21.4 FIN_Levanluhta_IA_o
19.4 RUS_Krasnoyarsk_BA
6.6 Yamnaya_KAZ_Mereke
4.8 SWE_BA
3.6 UKR_Sredny_Stog_II_En
Target: Mari
Distance: 8.7117% / 0.08711679
46.0 UKR_Sredny_Stog_II_En
31.8 RUS_Krasnoyarsk_BA
22.2 Baltic_EST_BA
Target: Chuvash
Distance: 5.7643% / 0.05764286
50.8 UKR_Sredny_Stog_II_En
24.6 Baltic_EST_BA
24.6 RUS_Krasnoyarsk_BA
Target: Udmurt
Distance: 2.2666% / 0.02266610
47.8 UKR_Sredny_Stog_II_En
24.6 RUS_Krasnoyarsk_BA
13.8 Baltic_EST_BA
13.8 Yamnaya_KAZ_Mereke
The distance fits for Mari and Chuvash are horrible though but they both are genetically very drifted in G25.
From the amount of their Mongoloid ancestry, Saami, Chuvash, Mari, Udmurt are literally the reverse/opposition version of Altaian, Kyrgyz and Khakass who are around 25-32% Caucasoid.
Target: Altaian
Distance: 2.1939% / 0.02193893
49.2 MNG_North_N
16.2 RUS_Shamanka_N
10.0 RUS_Afanasievo
8.2 Oroqen
7.6 RUS_Sintashta_MLBA
4.6 TUR_Barcin_N
4.2 TJK_Sarazm_En
Target: Althai_Kizhi
Distance: 2.4724% / 0.02472399
44.6 MNG_North_N
24.4 RUS_Shamanka_N
10.6 RUS_Afanasievo
8.8 RUS_Sintashta_MLBA
5.8 Oroqen
3.2 TUR_Barcin_N
2.6 TJK_Sarazm_En
Target: Kirghiz
Distance: 2.0017% / 0.02001725
23.0 Oroqen
22.4 MNG_North_N
21.4 RUS_Devils_Gate_Cave_N
15.4 RUS_Sintashta_MLBA
8.8 TJK_Sarazm_En
4.6 TUR_Barcin_N
4.4 RUS_Afanasievo
Target: Kirghiz_China
Distance: 2.3150% / 0.02314983
29.2 Oroqen
20.2 RUS_Devils_Gate_Cave_N
19.8 RUS_Sintashta_MLBA
17.4 MNG_North_N
11.2 TJK_Sarazm_En
2.2 TUR_Barcin_N
Target: Khakass
Distance: 2.8290% / 0.02828997
67.6 RUS_Shamanka_N
15.2 RUS_Afanasievo
15.2 RUS_Sintashta_MLBA
1.2 TJK_Sarazm_En
0.8 TUR_Barcin_N
Target: Khakass_Kachins
Distance: 2.3621% / 0.02362103
56.2 RUS_Shamanka_N
16.8 MNG_North_N
12.6 RUS_Afanasievo
8.6 RUS_Sintashta_MLBA
3.6 TUR_Barcin_N
1.8 Oroqen
0.4 TJK_Sarazm_En
Target: Kazakh_China
Distance: 2.3730% / 0.02372980
32.0 Oroqen
20.8 RUS_Devils_Gate_Cave_N
19.2 RUS_Sintashta_MLBA
18.2 MNG_North_N
8.8 TJK_Sarazm_En
1.0 TUR_Barcin_N
Only the Kazakh are more Caucasoid than the Mari, Udmurt, Saami, Chuvash are Mongoloid:
Target: Kazakh
Distance: 1.9419% / 0.01941909
24.2 Oroqen
21.2 MNG_North_N
14.8 RUS_Sintashta_MLBA
14.4 RUS_Devils_Gate_Cave_N
9.0 RUS_Afanasievo
8.6 TJK_Sarazm_En
7.8 TUR_Barcin_N
Therefore, Mari, Chuvash, Udmurt, Saami are literally the opposite version of Altaians, Kyrgyz and Khakass. In my opinion, these Uralics have enough Mongoloid to be seen more of a Hapa or transitional race between Europeans and Asians than only European.
Now if we included the Bashkir, Mansi and Khanty, they are literally Eurasians/Hapas as they are around 47-50% Mongoloid.
Target: Bashkir
Distance: 2.2061% / 0.02206102
52.2 UKR_Sredny_Stog_II_En
30.0 RUS_Shamanka_N
10.6 RUS_Krasnoyarsk_BA
4.0 Baltic_EST_BA
3.2 Yamnaya_KAZ_Mereke
Target: Mansi
Distance: 4.7446% / 0.04744556
48.4 RUS_Krasnoyarsk_BA
32.0 UKR_Sredny_Stog_II_En
16.8 RUS_AfontovaGora3
2.8 Baltic_EST_BA
Target: Khanty
Distance: 4.7254% / 0.04725353
50.0 RUS_Krasnoyarsk_BA
30.6 UKR_Sredny_Stog_II_En
19.4 RUS_AfontovaGora3
Target: Khants
Distance: 4.6029% / 0.04602942
49.6 RUS_Krasnoyarsk_BA
31.0 UKR_Sredny_Stog_II_En
16.8 RUS_AfontovaGora3
2.6 Baltic_EST_BA
P.S.-The Bashkir need Shamanka_N to improve their fits as they have significant Turkic ancestry while surprisingly the Chuvash don't need any Shamanka_N but maybe its because they are genetic drifted, that's why they don't need the input.
Lemminkäinen
03-09-2021, 08:10 AM
Because the Admixture program usually calculates non-European admixtures using European references (assumed to be a zero level) and Eurasian admixtures are present almost everywhere in Europe. But I don't trust in G25 either, because (being Davidski's test?) it is based on PCA components and PCA results depend on the used sample set. If some population or group is underrepredented it gets too little weight and conversely. If it is extremely inbred and even overrepresented it gets too much weight.
Komintasavalta
03-09-2021, 08:39 AM
I tried doing qpAdm models of the population named Saami.DG in the v44.3_HO dataset. I excluded models with one or more negative weight (where feasible is false) and I sorted the models by their p score.
I'm probably doing something wrong, and I still don't know how to pick the outgroups. I mostly just picked outgroups that resulted in little decrease in the number of SNPs that remained after filtering. I also tried to pick left populations that resulted in little decrease in the SNP count.
I got 374794 out of 597573 SNPs after filtering, out of which 349558 were polymorphic.
https://i.ibb.co/vcTCKNz/b.png
In the image above, the models whose p score is above .05 have a constant of about 30-35% Nganasan ancestry. However EHG and CHG and SHG are also part Mongoloid. So if we consider Nganasan to be fully Mongoloid, Saami might also be closer to 40% than 30% Mongoloid.
Both individuals in the population Saami.DG were from Utsjoki, which is part of the Northern Saami region within Finland:
$ awk 'NR==1||/Saami...DG/' g/v44.3_HO_public/v44.3_HO_public.anno|cut -f2,4,9,10|tr \\t \;
Version ID;Publication (or OK to use in a paper);Locality;Country
S_Saami-1.DG;MallickNature2016;Utsjoki;Finland
S_Saami-2.DG;MallickNature2016;Utsjoki;Finland
Among Finnish Saami, there are an estimated 2,000 speakers of Northern Saami, 300 speakers of Inari Saami, and 300 speakers of Skolt Saami (https://fi.wikipedia.org/wiki/Saamelaiskielet). Out of four groups of Saami measured by Karin Mark, Skolt Saami had the lightest pigmentation, followed by Inari Saami, Finnish Northern Saami, and Kola Saami (https://www.etis.ee/Portal/Publications/Display/1fd319c0-7408-4e31-9f18-b9b3010eabad).
Scandinavian Northern Saami might be even more Mongoloid than Finnish Northern Saami, or at least Coon wrote that the Saami of the Scandinavian inland were the darkest and most brachycephalic (https://www.theapricity.com/snpa/chapter-IX2.htm):
The selected "pure" groups, Bryn's Reindeer Lapps, and some of Geyer's mountain and forest Lapps from Sweden, have seventy per cent or over of this dark hair, while the fairest Lapps, with a majority of brown and blond shades, are found in Finland and in the Kola Peninsula.
Pure dark eyes are found among one-third of Reindeer Lapps, and among as few as eight per cent in the total of Lapps from Norway.[14] Pure light and light-mixed eyes are commonest among the Lapps of Finland, where they total between thirty and forty per cent, and least common among the Reindeer Lapps of interior Norway and Sweden. Even among the purest selected sub-groups, such as that of Geyer, who isolated from a larger Swedish Lapp sample a few individuals of most pronounced Lappish type, at least a third are light or light-mixed in iris color. [...]
There are, however, regional differences; the center of extreme round headedness lies among the inland groups in northern Norway, while the Swedish, Finnish, and Kola Peninsula Lapps become progressively narrower headed. The mean for the purest Reindeer Lapps of Norway is 87; for the easternmost Lapps, 80 to 83.
Code for ADMIXTOOLS 2:
target="Saami.DG"
left=c("Turkey_Boncuklu_N.SG","Armenia_Caucasus_KuraAraxes","Latvia_HG","Sweden_Motala_HG","Russia_HG_Karelia","Russia_HG_Tyumen","Nganasan")
right=c("Mbuti.DG","Mixe.DG","Ami.DG","Papuan.DG","Chimp.REF","Ju_hoan_North","Biaka.DG","Yoruba.DG","Altai_Neanderthal.DG")
pops=c(left,right,target)
unlink("/tmp/f2",recursive=T)
extract_f2(pref="g/v44.3_HO_public/v44.3_HO_public",pops=pops,outdir="/tmp/f2")
f2=f2_from_precomp("/tmp/f2")
qp=qpadm(f2,left=left,right=right,target=target)
qp2=qp$popdrop%>%dplyr::filter(feasible==T&f4rank!=0)%>%arrange(desc(p))%>%dplyr::select(!c(wt,dof,chisq,f4rank,dofdiff,chis qdiff,p_nested,feasible,best,dofdiff,chisqdiff,p_n ested))
write_csv(qp2,"/tmp/qp")
Code to generate the bar chart:
library(tidyverse)
library(reshape2)
library(colorspace)
t=read_csv("/tmp/qp")
# t=t[t$p>.05,]
pvalue=sub("^0","",sprintf("%.3f",t$p))
t=t[-2]
t2=melt(t,id.var="pat")
ggplot(t2,aes(x=fct_rev(factor(pat,level=t$pat)),y =value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=round(100*value)),position=pos ition_stack(vjust=.5,reverse=T),size=3.5)+
coord_flip()+
theme(
axis.text.x=element_blank(),
axis.text=element_text(color="black"),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
legend.box.just="center",
legend.box.margin=margin(0),
legend.box.spacing=unit(.05,"in"),
legend.direction="vertical",
legend.justification="center",
legend.margin=margin(0),
legend.text=element_text(size=12),
legend.title=element_blank(),
panel.border=element_blank(),
text=element_text(size=16)
)+
xlab("")+
scale_x_discrete(labels=rev(pvalue),expand=c(0,0)) +
scale_y_discrete(expand=c(0,0))+
scale_fill_manual("legend",values=hex(HSV(c(45,45,210,210,120,120,300),c(.6, .6,.6,.6,.6,.6,.6),c(1,.6,1,.6,1,.6,1))))
ggsave("/tmp/a.png",width=7,height=7)
I tried doing qpAdm models of the population named Saami.DG in the v44.3_HO dataset. I excluded models with one or more negative weight (where feasible is false) and I sorted the models by their p score.
I'm probably doing something wrong, and I still don't know how to pick the outgroups. I mostly just picked outgroups that resulted in little decrease in the number of SNPs that remained after filtering. I also tried to pick left populations that resulted in little decrease in the SNP count.
I got 374794 out of 597573 SNPs after filtering, out of which 349558 were polymorphic.
https://i.ibb.co/vcTCKNz/b.png
In the image above, the models whose p score is above .05 have a constant of about 30-35% Nganasan ancestry. However EHG and CHG and SHG are also part Mongoloid. So if we consider Nganasan to be fully Mongoloid, Saami might also be closer to 40% than 30% Mongoloid.
Both individuals in the population Saami.DG were from Utsjoki, which is part of the Northern Saami region within Finland:
$ awk 'NR==1||/Saami...DG/' g/v44.3_HO_public/v44.3_HO_public.anno|cut -f2,4,9,10|tr \\t \;
Version ID;Publication (or OK to use in a paper);Locality;Country
S_Saami-1.DG;MallickNature2016;Utsjoki;Finland
S_Saami-2.DG;MallickNature2016;Utsjoki;Finland
Among Finnish Saami, there are an estimated 2,000 speakers of Northern Saami, 300 speakers of Inari Saami, and 300 speakers of Skolt Saami (https://fi.wikipedia.org/wiki/Saamelaiskielet). Out of four groups of Saami measured by Karin Mark, Skolt Saami had the lightest pigmentation, followed by Inari Saami, Finnish Northern Saami, and Kola Saami (https://www.etis.ee/Portal/Publications/Display/1fd319c0-7408-4e31-9f18-b9b3010eabad).
Scandinavian Northern Saami might be even more Mongoloid than Finnish Northern Saami, or at least Coon wrote that the Saami of the Scandinavian inland were the darkest and most brachycephalic (https://www.theapricity.com/snpa/chapter-IX2.htm):
The selected "pure" groups, Bryn's Reindeer Lapps, and some of Geyer's mountain and forest Lapps from Sweden, have seventy per cent or over of this dark hair, while the fairest Lapps, with a majority of brown and blond shades, are found in Finland and in the Kola Peninsula.
Pure dark eyes are found among one-third of Reindeer Lapps, and among as few as eight per cent in the total of Lapps from Norway.[14] Pure light and light-mixed eyes are commonest among the Lapps of Finland, where they total between thirty and forty per cent, and least common among the Reindeer Lapps of interior Norway and Sweden. Even among the purest selected sub-groups, such as that of Geyer, who isolated from a larger Swedish Lapp sample a few individuals of most pronounced Lappish type, at least a third are light or light-mixed in iris color. [...]
There are, however, regional differences; the center of extreme round headedness lies among the inland groups in northern Norway, while the Swedish, Finnish, and Kola Peninsula Lapps become progressively narrower headed. The mean for the purest Reindeer Lapps of Norway is 87; for the easternmost Lapps, 80 to 83.
Code for ADMIXTOOLS 2:
target="Saami.DG"
left=c("Turkey_Boncuklu_N.SG","Armenia_Caucasus_KuraAraxes","Latvia_HG","Sweden_Motala_HG","Russia_HG_Karelia","Russia_HG_Tyumen","Nganasan")
right=c("Mbuti.DG","Mixe.DG","Ami.DG","Papuan.DG","Chimp.REF","Ju_hoan_North","Biaka.DG","Yoruba.DG","Altai_Neanderthal.DG")
pops=c(left,right,target)
unlink("/tmp/f2",recursive=T)
extract_f2(pref="g/v44.3_HO_public/v44.3_HO_public",pops=pops,outdir="/tmp/f2")
f2=f2_from_precomp("/tmp/f2")
qp=qpadm(f2,left=left,right=right,target=target)
qp2=qp$popdrop%>%dplyr::filter(feasible==T&f4rank!=0)%>%arrange(desc(p))%>%dplyr::select(!c(wt,dof,chisq,f4rank,dofdiff,chis qdiff,p_nested,feasible,best,dofdiff,chisqdiff,p_n ested))
write_csv(qp2,"/tmp/qp")
Code to generate the bar chart:
library(tidyverse)
library(reshape2)
library(colorspace)
t=read_csv("/tmp/qp")
# t=t[t$p>.05,]
pvalue=sub("^0","",sprintf("%.3f",t$p))
t=t[-2]
t2=melt(t,id.var="pat")
ggplot(t2,aes(x=fct_rev(factor(pat,level=t$pat)),y =value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=round(100*value)),position=pos ition_stack(vjust=.5,reverse=T),size=3.5)+
coord_flip()+
theme(
axis.text.x=element_blank(),
axis.text=element_text(color="black"),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
legend.box.just="center",
legend.box.margin=margin(0),
legend.box.spacing=unit(.05,"in"),
legend.direction="vertical",
legend.justification="center",
legend.margin=margin(0),
legend.text=element_text(size=12),
legend.title=element_blank(),
panel.border=element_blank(),
text=element_text(size=16)
)+
xlab("")+
scale_x_discrete(labels=rev(pvalue),expand=c(0,0)) +
scale_y_discrete(expand=c(0,0))+
scale_fill_manual("legend",values=hex(HSV(c(45,45,210,210,120,120,300),c(.6, .6,.6,.6,.6,.6,.6),c(1,.6,1,.6,1,.6,1))))
ggsave("/tmp/a.png",width=7,height=7)
I like how you used R to visualize the results and sort by p-value and that you posted your details on how you ran (although I can't see your SE). So looking at the details here's how you can improve the accuracy of your models:
1- Pright are used as references to distinguish between various sources used to model. Therefore, they should be pretty diverse in their ancestry. I noticed that Africans are over represented in pright. You really only need one group of Africans unless you are trying to model a target using multiple African pops. I would just keep Mbuti and drop the rest of the Africans from pright
2- Neanderthal and Chimp are pretty useless in differentiating between different Eurasians because all Eurasians are pretty much similarly related to them (Even with Neanderthal the difference between various Eurasians is just a couple of percent). Drop them
3- I would add Mesolithic Ancient Siberian Kolyma-Diploid to pright because some of your sources are quite differentially related to it because of its mix of ancient E Asian and Siberian (Yana type)
4- I would also for sure add CHG ( the sample labeled Kotias KK1 has the highest number of SNPs in the dataset you are using) because some of your sources are quite differentially related to it
5- I would also for sure add WHG ( the sample labeled Bichon Bichon has the highest number of SNPs in the dataset you are using) because some of your sources are quite differentially related to it
6- I would also add Tyumen to pright and use Devils-Gate in sources for Neolithic E Asian
7- I would add Iran-N to pright
8- I would add Iberouma . These have decent SNPs
Morocco_Iberomaurusian TAF010
Morocco_Iberomaurusian TAF011
Morocco_Iberomaurusian TAF013
Morocco_Iberomaurusian TAF014
9- I would add Kostenki14 I0876 to pright. It has many SNPs
10- I would add GoyetQ116-1 Q116-1 to pright
11- Drop Nganasan from sources and add Shamanka-EN instead. It's always better to keep it cosistently Ancients. Shamanka would be more ancestral to Uralics than Nganasan.
These are decent ones
Russia_Shamanka_Eneolithic DA245 3772618 4676043 0.81
Russia_Shamanka_Eneolithic DA246 3680753 4668444 0.79
Russia_Shamanka_Eneolithic DA247 3733439 4676043 0.80
Russia_Shamanka_Eneolithic DA248 3744767 4676043 0.80
Russia_Shamanka_Eneolithic DA249 3590256 4668444 0.77
Russia_Shamanka_Eneolithic DA252 3732230 4668444 0.80
Russia_Shamanka_Eneolithic DA253 3703323 4668444 0.79
12- You really need Anatolia-N in pright also if you're not going to use it as a source.
Your SNPs will drop and p-values but your models will be significantly more accurate and your SE should be better too.
Don't forget to download the latest Admixtools 2. It has some significant fixes.
Lemminkäinen
03-09-2021, 10:46 AM
I have always used the right group used previously by studies or Davidski to have something to compare with. In principle I undersand that the right group should be build of ancestral populations of common ancestry for all left population, being enough archaic to cover all left populations, not being remarkable dominant for any of them, but not too distant to be at equal distant for all.
P-values lower than 0.05 are significat compared to the null hypothesis.
I have seen some users here underestimate or downplay the East Eurasian ancestry of some Finno-Ugrics such as Saami, saying that they have only minor 5-10% Mongoloid or maybe 15%, acting as if they are not that different from the average Euros, when that's not true at all.
E Asians have higher genetic similarity with Saamis than other mainland Europeans. I wouldn't rely on G25 for that though because the results can be misleading. In general with any calculator the amount of E Asian will change depending on what other components the calculator uses.
You have to do a gene to gene comparison between E Asian and each European population one at a time to get an accurate picture. Here is IBS similarity with Mongola sample based on Plink --genome flag using 400,000 SNPs.
As you can see Mongola has about the same amount of IBS with Saamis as with some S Asians and not that much more than Iraqi Kurd or some Finns which would be a shocker to you if you just went by calculator results.
<style type="text/css">td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}</style>
<tbody>
NO
FID1
FID2
IID2
PI_HAT
IBS
1
S_Mongola-1
Korean
S_Korean-1
0.157
0.81068
2
S_Mongola-1
Han
S_Han-1
0.1538
0.81020
3
S_Mongola-1
Japanese
S_Japanese-1
0.1603
0.80999
4
S_Mongola-1
Xibo
S_Xibo-2
0.1463
0.80968
5
S_Mongola-1
Korean
S_Korean-2
0.1546
0.80955
6
S_Mongola-1
Han
S_Han-2
0.1562
0.80910
7
S_Mongola-1
Tujia
S_Tujia-2
0.1522
0.80896
8
S_Mongola-1
Japanese
S_Japanese-2
0.148
0.80880
9
S_Mongola-1
She
S_She-1
0.1542
0.80875
10
S_Mongola-1
She
S_She-2
0.1535
0.80870
11
S_Mongola-1
Naxi
S_Naxi-1
0.1527
0.80869
12
S_Mongola-1
Japanese
S_Japanese-3
0.1426
0.80865
13
S_Mongola-1
Hezhen
S_Hezhen-2
0.1438
0.80863
14
S_Mongola-1
Yi
S_Yi-1
0.1494
0.80853
15
S_Mongola-1
Xibo
S_Xibo-1
0.1408
0.80837
16
S_Mongola-1
Miao
S_Miao-2
0.1534
0.80827
17
S_Mongola-1
Kinh
S_Kinh-1
0.1488
0.80800
18
S_Mongola-1
Naxi
S_Naxi-3
0.1516
0.80795
19
S_Mongola-1
Hezhen
S_Hezhen-1
0.1514
0.80782
20
S_Mongola-1
Tujia
S_Tujia-1
0.1519
0.80772
21
S_Mongola-1
Mongola
S_Mongola-2
0.1456
0.80755
22
S_Mongola-1
Miao
S_Miao-1
0.1518
0.80748
23
S_Mongola-1
Ulchi
S_Ulchi-1
0.1642
0.80746
24
S_Mongola-1
Oroqen
S_Oroqen-1
0.1575
0.80745
25
S_Mongola-1
Yi
S_Yi-2
0.1529
0.80724
26
S_Mongola-1
Daur
S_Daur-2
0.1422
0.80716
27
S_Mongola-1
Ulchi
S_Ulchi-2
0.1566
0.80713
28
S_Mongola-1
Oroqen
S_Oroqen-2
0.1588
0.80693
29
S_Mongola-1
Dai
S_Dai-1
0.1463
0.80672
30
S_Mongola-1
Even
S_Even-3
0.1583
0.80661
31
S_Mongola-1
Dai
S_Dai-2
0.1519
0.80603
32
S_Mongola-1
Tu
S_Tu-2
0.1387
0.80580
33
S_Mongola-1
Kinh
S_Kinh-2
0.1415
0.80574
34
S_Mongola-1
Thai
S_Thai-2
0.1401
0.80573
35
S_Mongola-1
China_Lahu
S_Lahu-1
0.1524
0.80558
36
S_Mongola-1
Burmese
S_Burmese-1
0.1385
0.80540
37
S_Mongola-1
Tu
S_Tu-1
0.1354
0.80530
38
S_Mongola-1
Ami.DG
S_Ami1
0.1575
0.80503
39
S_Mongola-1
Ami.DG
S_Ami2
0.1595
0.80502
40
S_Mongola-1
Even
S_Even-2
0.1555
0.80488
41
S_Mongola-1
Yakut
S_Yakut-1
0.1485
0.80419
42
S_Mongola-1
China_Lahu
S_Lahu-2
0.1523
0.80397
43
S_Mongola-1
Igorot
S_Igorot-2
0
0.80313
44
S_Mongola-1
Dusun
S_Dusun-2
0
0.80309
45
S_Mongola-1
Dusun
S_Dusun-1
0
0.80308
46
S_Mongola-1
Thai
S_Thai-1
0.1275
0.80306
47
S_Mongola-1
Igorot
S_Igorot-1
0
0.80301
48
S_Mongola-1
Cambodian
S_Cambodian-1
0.1407
0.80241
49
S_Mongola-1
Even
S_Even-1
0.1214
0.80214
50
S_Mongola-1
Burmese
S_Burmese-2
0.1169
0.80213
51
S_Mongola-1
Yakut
S_Yakut-2
0.1438
0.80209
52
S_Mongola-1
Cambodian
S_Cambodian-2
0.134
0.80188
53
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki1
0
0.80124
54
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-1
0.1127
0.79908
55
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-2
0.1005
0.79815
56
S_Mongola-1
Itelmen
S_Itelman-1
0
0.79809
57
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan2
0
0.79789
58
S_Mongola-1
Eskimo_Chaplin.DG
S_Chaplin1
0
0.79770
59
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan1
0
0.79751
60
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki2
0
0.79749
61
S_Mongola-1
Kusunda
S_Kusunda-1
0.1132
0.79740
62
S_Mongola-1
Tubalar
S_Tubalar-2
0
0.79509
63
S_Mongola-1
Tubalar
S_Tubalar-1
0.1107
0.79490
64
S_Mongola-1
Chukchi
S_Chukchi-1
0.0841
0.79357
65
S_Mongola-1
Uyghur
S_Uygur-1
0.0898
0.79336
66
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec1
0
0.79282
67
S_Mongola-1
Mansi
S_Mansi-1
0
0.79238
68
S_Mongola-1
Hazara
S_Hazara-1
0
0.79204
69
S_Mongola-1
Pima
S_Pima-1
0
0.79198
70
S_Mongola-1
Uyghur
S_Uygur-2
0
0.79197
71
S_Mongola-1
Hazara
S_Hazara-2
0
0.79170
72
S_Mongola-1
Mayan
S_Mayan-2
0
0.79120
73
S_Mongola-1
Mixtec
S_Mixtec-1
0
0.79120
74
S_Mongola-1
Mixe
S_Mixe-2
0
0.79115
75
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec2
0
0.79101
76
S_Mongola-1
Mayan
S_Mayan-1
0
0.79087
77
S_Mongola-1
Quechua
S_Quechua-3
0
0.79075
78
S_Mongola-1
Mixe
S_Mixe-3
0
0.79044
79
S_Mongola-1
Piapoco
S_Piapoco-2
0
0.79029
80
S_Mongola-1
Quechua
S_Quechua-1
0
0.79023
81
S_Mongola-1
Quechua
S_Quechua-2
0
0.78995
82
S_Mongola-1
Pima
S_Pima-2
0
0.78978
83
S_Mongola-1
Mansi
S_Mansi-2
0
0.78962
84
S_Mongola-1
Khonda_Dora
S_Khonda_Dora-1
0
0.78847
85
S_Mongola-1
Tlingit
S_Tlingit-2
0
0.78816
86
S_Mongola-1
Mixtec
S_Mixtec-2
0
0.78811
87
S_Mongola-1
Maori
S_Maori-1
0.0542
0.78805
88
S_Mongola-1
Piapoco
S_Piapoco-1
0
0.78747
89
S_Mongola-1
Karitiana
S_Karitiana-2
0
0.78742
90
S_Mongola-1
Surui
S_Surui-1
0
0.78727
91
S_Mongola-1
Surui
S_Surui-2
0
0.78565
92
S_Mongola-1
Karitiana
S_Karitiana-1
0
0.78561
93
S_Mongola-1
Bengali
S_Bengali-1
0
0.78436
94
S_Mongola-1
Kusunda
S_Kusunda-2
0
0.78408
95
S_Mongola-1
Tlingit
S_Tlingit-1
0
0.78388
96
S_Mongola-1
Relli
S_Relli-1
0
0.78344
97
S_Mongola-1
Kapu
S_Kapu-2
0
0.78280
98
S_Mongola-1
Madiga
S_Madiga-1
0
0.78227
99
S_Mongola-1
Madiga
S_Madiga-2
0
0.78175
100
S_Mongola-1
Mala
S_Mala-3
0
0.78161
101
S_Mongola-1
Yadava
S_Yadava-1
0
0.78157
102
S_Mongola-1
Bengali
S_Bengali-2
0
0.78140
103
S_Mongola-1
Kapu
S_Kapu-1
0
0.78130
104
S_Mongola-1
Irula
S_Irula-2
0
0.78128
105
S_Mongola-1
Mala
S_Mala-2
0
0.78128
106
S_Mongola-1
Punjabi
S_Punjabi-1
0
0.78107
107
S_Mongola-1
Irula
S_Irula-1
0
0.78107
108
S_Mongola-1
Burusho
S_Burusho-2
0
0.78081
109
S_Mongola-1
Yadava
S_Yadava-2
0
0.78078
110
S_Mongola-1
Saami
S_Saami-1
0
0.78063
111
S_Mongola-1
Brahmin
S_Brahmin-2
0
0.78031
112
S_Mongola-1
Saami
S_Saami-2
0
0.78012
113
S_Mongola-1
Relli
S_Relli-2
0
0.77974
114
S_Mongola-1
Punjabi
S_Punjabi-3
0
0.77920
115
S_Mongola-1
Bougainville
S_Bougainville-1
0
0.77900
116
S_Mongola-1
Burusho
S_Burusho-1
0
0.77885
117
S_Mongola-1
Punjabi
S_Punjabi-2
0
0.77885
118
S_Mongola-1
Brahmin
S_Brahmin-1
0
0.77874
119
S_Mongola-1
Bougainville
S_Bougainville-2
0
0.77866
120
S_Mongola-1
Sindhi
S_Sindhi-2
0
0.77851
121
S_Mongola-1
Pathan
S_Pathan-1
0
0.77838
122
S_Mongola-1
Punjabi
S_Punjabi-4
0
0.77776
123
S_Mongola-1
Kurd-Iraq
WGS
0
0.77625
124
S_Mongola-1
Pathan
S_Pathan-2
0
0.77597
125
S_Mongola-1
Ossetian-North
S_Ossetian-1
0
0.77575
126
S_Mongola-1
Russian
S_Russian-1
0
0.77570
127
S_Mongola-1
Finnish
S_Finnish-1
0
0.77476
128
S_Mongola-1
Sindhi
S_Sindhi-1
0
0.77473
129
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-1
0
0.77463
130
S_Mongola-1
Tajik
S_Tajik-2
0
0.77448
131
S_Mongola-1
YANA_UP_WGS
Yana1
0
0.77422
132
S_Mongola-1
Ossetian-North
S_Ossetian-2
0
0.77413
133
S_Mongola-1
Papuan
S_Papuan-10
0
0.77381
134
S_Mongola-1
Balochi
S_Balochi-2
0
0.77365
135
S_Mongola-1
Brahui
S_Brahui-1
0
0.77363
136
S_Mongola-1
Adygei
S_Adygei-1
0
0.77334
137
S_Mongola-1
Makrani
S_Makrani-1
0
0.77334
138
S_Mongola-1
Finnish
S_Finnish-3
0
0.77319
139
S_Mongola-1
Adygei
S_Adygei-2
0
0.77319
140
S_Mongola-1
Kalash
S_Kalash-2
0
0.77319
141
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-2
0
0.77319
142
S_Mongola-1
Chechen
S_Chechen-1
0
0.77312
143
S_Mongola-1
Papuan
S_Papuan-9
0
0.77307
144
S_Mongola-1
Russian
S_Russian-2
0
0.77288
145
S_Mongola-1
Icelandic
S_Icelandic-1
0
0.77260
146
S_Mongola-1
Finnish
S_Finnish-2
0
0.77258
147
S_Mongola-1
Papuan
S_Papuan-12
0
0.77257
148
S_Mongola-1
Kalash
S_Kalash-1
0
0.77247
149
S_Mongola-1
Lezgin
S_Lezgin-1
0
0.77245
150
S_Mongola-1
Papuan
S_Papuan-8
0
0.77232
151
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-1
0
0.77197
152
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-1
0
0.77194
153
S_Mongola-1
Brahui
S_Brahui-2
0
0.77178
154
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-2
0
0.77170
155
S_Mongola-1
Papuan
S_Papuan-1
0
0.77164
156
S_Mongola-1
Norwegian
S_Norwegian-1
0
0.77159
157
S_Mongola-1
Orcadian
S_Orcadian-2
0
0.77158
158
S_Mongola-1
Estonian
S_Estonian-1
0
0.77155
159
S_Mongola-1
Papuan
S_Papuan-7
0
0.77150
160
S_Mongola-1
Papuan
S_Papuan-11
0
0.77146
161
S_Mongola-1
Estonian
S_Estonian-2
0
0.77144
162
S_Mongola-1
Papuan
S_Papuan-13
0
0.77131
163
S_Mongola-1
Tajik
S_Tajik-1
0
0.77131
164
S_Mongola-1
Papuan
S_Papuan-14
0
0.77129
165
S_Mongola-1
Hungarian
S_Hungarian-2
0
0.77120
166
S_Mongola-1
Czech
S_Czech-2
0
0.77120
167
S_Mongola-1
Papuan
S_Papuan-3
0
0.77119
168
S_Mongola-1
Icelandic
S_Icelandic-2
0
0.77119
169
S_Mongola-1
Hungarian
S_Hungarian-1
0
0.77111
170
S_Mongola-1
Polish
S_Polish-1
0
0.77110
171
S_Mongola-1
Bulgarian
S_Bulgarian-1
0
0.77106
172
S_Mongola-1
Greek
S_Greek-1
0
0.77103
173
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-2
0
0.77103
174
S_Mongola-1
Papuan
S_Papuan-5
0
0.77101
175
S_Mongola-1
French
S_French-2
0
0.77082
176
S_Mongola-1
Georgian
S_Georgian-1
0
0.77071
177
S_Mongola-1
Balochi
S_Balochi-1
0
0.77062
178
S_Mongola-1
Spanish
S_Spanish-1
0
0.77061
179
S_Mongola-1
Armenian
S_Armenian-1
0
0.77054
180
S_Mongola-1
Papuan
S_Papuan-6
0
0.77049
181
S_Mongola-1
Bergamo
S_Bergamo-2
0
0.77017
182
S_Mongola-1
Papuan
S_Papuan-2
0
0.77008
183
S_Mongola-1
Bulgarian
S_Bulgarian-2
0
0.77007
184
S_Mongola-1
Papuan
S_Papuan-4
0
0.77005
185
S_Mongola-1
Spanish
S_Spanish-2
0
0.76981
186
S_Mongola-1
Greek
S_Greek-2
0
0.76981
187
S_Mongola-1
Basque
S_Basque-1
0
0.76979
188
S_Mongola-1
English
S_English-1
0
0.76977
189
S_Mongola-1
Lezgin
S_Lezgin-2
0
0.76975
190
S_Mongola-1
Tuscan
S_Tuscan-2
0
0.76960
191
S_Mongola-1
Albanian.DG
S_Albanian1
0
0.76953
192
S_Mongola-1
English
S_English-2
0
0.76951
193
S_Mongola-1
Armenian
S_Armenian-2
0
0.76950
194
S_Mongola-1
Sardinian
S_Sardinian-2
0
0.76946
195
S_Mongola-1
Orcadian
S_Orcadian-1
0
0.76909
196
S_Mongola-1
Tuscan
S_Tuscan-1
0
0.76906
197
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-1
0
0.76901
198
S_Mongola-1
Basque
S_Basque-2
0
0.76888
199
S_Mongola-1
Georgian
S_Georgian-2
0
0.76886
200
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-2
0
0.76865
201
S_Mongola-1
Jordanian
S_Jordanian-3
0
0.76809
202
S_Mongola-1
French
S_French-1
0
0.76796
203
S_Mongola-1
BedouinB
S_BedouinB-2
0
0.76779
204
S_Mongola-1
Druze
S_Druze-1
0
0.76757
205
S_Mongola-1
Druze
S_Druze-2
0
0.76754
206
S_Mongola-1
Makrani
S_Makrani-2
0
0.76747
207
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-2
0
0.76622
208
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-1
0
0.76575
209
S_Mongola-1
Sardinian
S_Sardinian-1
0
0.76564
210
S_Mongola-1
BedouinB
S_BedouinB-1
0
0.76460
211
S_Mongola-1
Jordanian
S_Jordanian-2
0
0.76413
212
S_Mongola-1
Samaritan
S_Samaritan-1
0
0.76396
213
S_Mongola-1
Jordanian
S_Jordanian-1
0
0.76261
214
S_Mongola-1
Saharawi
S_Saharawi-2
0
0.75981
215
S_Mongola-1
Saharawi
S_Saharawi-1
0
0.75964
216
S_Mongola-1
Mozabite
S_Mozabite-1
0
0.75937
217
S_Mongola-1
Mozabite
S_Mozabite-2
0
0.75824
222
S_Mongola-1
Somali
S_Somali-1
0
0.74788
224
S_Mongola-1
Masai
S_Masai-2
0
0.74381
226
S_Mongola-1
Masai
S_Masai-1
0
0.74274
232
S_Mongola-1
Gambian
S_Gambian-2
0
0.73200
233
S_Mongola-1
BantuKenya
S_BantuKenya-1
0
0.73139
234
S_Mongola-1
Luo
S_Luo-2
0
0.73107
235
S_Mongola-1
BantuKenya
S_BantuKenya-2
0
0.73020
236
S_Mongola-1
Luhya
S_Luhya-1
0
0.73005
237
S_Mongola-1
Luhya
S_Luhya-2
0
0.73002
238
S_Mongola-1
Mandenka
S_Mandenka-2
0
0.72934
239
S_Mongola-1
Gambian
S_Gambian-1
0
0.72933
240
S_Mongola-1
Esan
S_Esan-2
0
0.72920
241
S_Mongola-1
Yoruba
S_Yoruba-2
0
0.72879
242
S_Mongola-1
Mandenka
S_Mandenka-1
0
0.72872
243
S_Mongola-1
Yoruba
S_Yoruba-1
0
0.72816
244
S_Mongola-1
Esan
S_Esan-1
0
0.72810
245
S_Mongola-1
Mende
S_Mende-1
0
0.72793
246
S_Mongola-1
Mende
S_Mende-2
0
0.72788
247
S_Mongola-1
Biaka
S_Biaka-1
0
0.72484
248
S_Mongola-1
Biaka
S_Biaka-2
0
0.72347
249
S_Mongola-1
Mbuti
S_Mbuti-3
0
0.72046
250
S_Mongola-1
Mbuti
S_Mbuti-1
0
0.72010
251
S_Mongola-1
Mbuti
S_Mbuti-2
0
0.72005
252
S_Mongola-1
Khomani_San
S_Khomani_San-2
0
0.71521
253
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-2
0
0.71514
254
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-3
0
0.71460
255
S_Mongola-1
Khomani_San
S_Khomani_San-1
0
0.71302
</tbody>
It's pretty amazing that the above IBS list was able to properly order Mbuti, Khomani, and Ju-Hoan in terms of IBS with Mongola.
Does anyone know why Mongola is slightly closer to Mbuti than Khomani and Ju-Hoan ?
Hint: The late paleolithic African paper :)
Zanzibar
03-09-2021, 02:00 PM
E Asians have higher genetic similarity with Saamis than other mainland Europeans. I wouldn't rely on G25 for that though because the results can be misleading. In general with any calculator the amount of E Asian will change depending on what other components the calculator uses.
You have to do a gene to gene comparison between E Asian and each European population one at a time to get an accurate picture. Here is IBS similarity with Mongola sample based on Plink --genome flag using 400,000 SNPs.
As you can see Mongola has about the same amount of IBS with Saamis as with some S Asians and not that much more than Iraqi Kurd or some Finns which would be a shocker to you if you just went by calculator results.
<style type="text/css">td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}</style>
<tbody>
NO
FID1
FID2
IID2
PI_HAT
IBS
1
S_Mongola-1
Korean
S_Korean-1
0.157
0.81068
2
S_Mongola-1
Han
S_Han-1
0.1538
0.81020
3
S_Mongola-1
Japanese
S_Japanese-1
0.1603
0.80999
4
S_Mongola-1
Xibo
S_Xibo-2
0.1463
0.80968
5
S_Mongola-1
Korean
S_Korean-2
0.1546
0.80955
6
S_Mongola-1
Han
S_Han-2
0.1562
0.80910
7
S_Mongola-1
Tujia
S_Tujia-2
0.1522
0.80896
8
S_Mongola-1
Japanese
S_Japanese-2
0.148
0.80880
9
S_Mongola-1
She
S_She-1
0.1542
0.80875
10
S_Mongola-1
She
S_She-2
0.1535
0.80870
11
S_Mongola-1
Naxi
S_Naxi-1
0.1527
0.80869
12
S_Mongola-1
Japanese
S_Japanese-3
0.1426
0.80865
13
S_Mongola-1
Hezhen
S_Hezhen-2
0.1438
0.80863
14
S_Mongola-1
Yi
S_Yi-1
0.1494
0.80853
15
S_Mongola-1
Xibo
S_Xibo-1
0.1408
0.80837
16
S_Mongola-1
Miao
S_Miao-2
0.1534
0.80827
17
S_Mongola-1
Kinh
S_Kinh-1
0.1488
0.80800
18
S_Mongola-1
Naxi
S_Naxi-3
0.1516
0.80795
19
S_Mongola-1
Hezhen
S_Hezhen-1
0.1514
0.80782
20
S_Mongola-1
Tujia
S_Tujia-1
0.1519
0.80772
21
S_Mongola-1
Mongola
S_Mongola-2
0.1456
0.80755
22
S_Mongola-1
Miao
S_Miao-1
0.1518
0.80748
23
S_Mongola-1
Ulchi
S_Ulchi-1
0.1642
0.80746
24
S_Mongola-1
Oroqen
S_Oroqen-1
0.1575
0.80745
25
S_Mongola-1
Yi
S_Yi-2
0.1529
0.80724
26
S_Mongola-1
Daur
S_Daur-2
0.1422
0.80716
27
S_Mongola-1
Ulchi
S_Ulchi-2
0.1566
0.80713
28
S_Mongola-1
Oroqen
S_Oroqen-2
0.1588
0.80693
29
S_Mongola-1
Dai
S_Dai-1
0.1463
0.80672
30
S_Mongola-1
Even
S_Even-3
0.1583
0.80661
31
S_Mongola-1
Dai
S_Dai-2
0.1519
0.80603
32
S_Mongola-1
Tu
S_Tu-2
0.1387
0.80580
33
S_Mongola-1
Kinh
S_Kinh-2
0.1415
0.80574
34
S_Mongola-1
Thai
S_Thai-2
0.1401
0.80573
35
S_Mongola-1
China_Lahu
S_Lahu-1
0.1524
0.80558
36
S_Mongola-1
Burmese
S_Burmese-1
0.1385
0.80540
37
S_Mongola-1
Tu
S_Tu-1
0.1354
0.80530
38
S_Mongola-1
Ami.DG
S_Ami1
0.1575
0.80503
39
S_Mongola-1
Ami.DG
S_Ami2
0.1595
0.80502
40
S_Mongola-1
Even
S_Even-2
0.1555
0.80488
41
S_Mongola-1
Yakut
S_Yakut-1
0.1485
0.80419
42
S_Mongola-1
China_Lahu
S_Lahu-2
0.1523
0.80397
43
S_Mongola-1
Igorot
S_Igorot-2
0
0.80313
44
S_Mongola-1
Dusun
S_Dusun-2
0
0.80309
45
S_Mongola-1
Dusun
S_Dusun-1
0
0.80308
46
S_Mongola-1
Thai
S_Thai-1
0.1275
0.80306
47
S_Mongola-1
Igorot
S_Igorot-1
0
0.80301
48
S_Mongola-1
Cambodian
S_Cambodian-1
0.1407
0.80241
49
S_Mongola-1
Even
S_Even-1
0.1214
0.80214
50
S_Mongola-1
Burmese
S_Burmese-2
0.1169
0.80213
51
S_Mongola-1
Yakut
S_Yakut-2
0.1438
0.80209
52
S_Mongola-1
Cambodian
S_Cambodian-2
0.134
0.80188
53
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki1
0
0.80124
54
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-1
0.1127
0.79908
55
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-2
0.1005
0.79815
56
S_Mongola-1
Itelmen
S_Itelman-1
0
0.79809
57
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan2
0
0.79789
58
S_Mongola-1
Eskimo_Chaplin.DG
S_Chaplin1
0
0.79770
59
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan1
0
0.79751
60
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki2
0
0.79749
61
S_Mongola-1
Kusunda
S_Kusunda-1
0.1132
0.79740
62
S_Mongola-1
Tubalar
S_Tubalar-2
0
0.79509
63
S_Mongola-1
Tubalar
S_Tubalar-1
0.1107
0.79490
64
S_Mongola-1
Chukchi
S_Chukchi-1
0.0841
0.79357
65
S_Mongola-1
Uyghur
S_Uygur-1
0.0898
0.79336
66
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec1
0
0.79282
67
S_Mongola-1
Mansi
S_Mansi-1
0
0.79238
68
S_Mongola-1
Hazara
S_Hazara-1
0
0.79204
69
S_Mongola-1
Pima
S_Pima-1
0
0.79198
70
S_Mongola-1
Uyghur
S_Uygur-2
0
0.79197
71
S_Mongola-1
Hazara
S_Hazara-2
0
0.79170
72
S_Mongola-1
Mayan
S_Mayan-2
0
0.79120
73
S_Mongola-1
Mixtec
S_Mixtec-1
0
0.79120
74
S_Mongola-1
Mixe
S_Mixe-2
0
0.79115
75
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec2
0
0.79101
76
S_Mongola-1
Mayan
S_Mayan-1
0
0.79087
77
S_Mongola-1
Quechua
S_Quechua-3
0
0.79075
78
S_Mongola-1
Mixe
S_Mixe-3
0
0.79044
79
S_Mongola-1
Piapoco
S_Piapoco-2
0
0.79029
80
S_Mongola-1
Quechua
S_Quechua-1
0
0.79023
81
S_Mongola-1
Quechua
S_Quechua-2
0
0.78995
82
S_Mongola-1
Pima
S_Pima-2
0
0.78978
83
S_Mongola-1
Mansi
S_Mansi-2
0
0.78962
84
S_Mongola-1
Khonda_Dora
S_Khonda_Dora-1
0
0.78847
85
S_Mongola-1
Tlingit
S_Tlingit-2
0
0.78816
86
S_Mongola-1
Mixtec
S_Mixtec-2
0
0.78811
87
S_Mongola-1
Maori
S_Maori-1
0.0542
0.78805
88
S_Mongola-1
Piapoco
S_Piapoco-1
0
0.78747
89
S_Mongola-1
Karitiana
S_Karitiana-2
0
0.78742
90
S_Mongola-1
Surui
S_Surui-1
0
0.78727
91
S_Mongola-1
Surui
S_Surui-2
0
0.78565
92
S_Mongola-1
Karitiana
S_Karitiana-1
0
0.78561
93
S_Mongola-1
Bengali
S_Bengali-1
0
0.78436
94
S_Mongola-1
Kusunda
S_Kusunda-2
0
0.78408
95
S_Mongola-1
Tlingit
S_Tlingit-1
0
0.78388
96
S_Mongola-1
Relli
S_Relli-1
0
0.78344
97
S_Mongola-1
Kapu
S_Kapu-2
0
0.78280
98
S_Mongola-1
Madiga
S_Madiga-1
0
0.78227
99
S_Mongola-1
Madiga
S_Madiga-2
0
0.78175
100
S_Mongola-1
Mala
S_Mala-3
0
0.78161
101
S_Mongola-1
Yadava
S_Yadava-1
0
0.78157
102
S_Mongola-1
Bengali
S_Bengali-2
0
0.78140
103
S_Mongola-1
Kapu
S_Kapu-1
0
0.78130
104
S_Mongola-1
Irula
S_Irula-2
0
0.78128
105
S_Mongola-1
Mala
S_Mala-2
0
0.78128
106
S_Mongola-1
Punjabi
S_Punjabi-1
0
0.78107
107
S_Mongola-1
Irula
S_Irula-1
0
0.78107
108
S_Mongola-1
Burusho
S_Burusho-2
0
0.78081
109
S_Mongola-1
Yadava
S_Yadava-2
0
0.78078
110
S_Mongola-1
Saami
S_Saami-1
0
0.78063
111
S_Mongola-1
Brahmin
S_Brahmin-2
0
0.78031
112
S_Mongola-1
Saami
S_Saami-2
0
0.78012
113
S_Mongola-1
Relli
S_Relli-2
0
0.77974
114
S_Mongola-1
Punjabi
S_Punjabi-3
0
0.77920
115
S_Mongola-1
Bougainville
S_Bougainville-1
0
0.77900
116
S_Mongola-1
Burusho
S_Burusho-1
0
0.77885
117
S_Mongola-1
Punjabi
S_Punjabi-2
0
0.77885
118
S_Mongola-1
Brahmin
S_Brahmin-1
0
0.77874
119
S_Mongola-1
Bougainville
S_Bougainville-2
0
0.77866
120
S_Mongola-1
Sindhi
S_Sindhi-2
0
0.77851
121
S_Mongola-1
Pathan
S_Pathan-1
0
0.77838
122
S_Mongola-1
Punjabi
S_Punjabi-4
0
0.77776
123
S_Mongola-1
Kurd-Iraq
WGS
0
0.77625
124
S_Mongola-1
Pathan
S_Pathan-2
0
0.77597
125
S_Mongola-1
Ossetian-North
S_Ossetian-1
0
0.77575
126
S_Mongola-1
Russian
S_Russian-1
0
0.77570
127
S_Mongola-1
Finnish
S_Finnish-1
0
0.77476
128
S_Mongola-1
Sindhi
S_Sindhi-1
0
0.77473
129
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-1
0
0.77463
130
S_Mongola-1
Tajik
S_Tajik-2
0
0.77448
131
S_Mongola-1
YANA_UP_WGS
Yana1
0
0.77422
132
S_Mongola-1
Ossetian-North
S_Ossetian-2
0
0.77413
133
S_Mongola-1
Papuan
S_Papuan-10
0
0.77381
134
S_Mongola-1
Balochi
S_Balochi-2
0
0.77365
135
S_Mongola-1
Brahui
S_Brahui-1
0
0.77363
136
S_Mongola-1
Adygei
S_Adygei-1
0
0.77334
137
S_Mongola-1
Makrani
S_Makrani-1
0
0.77334
138
S_Mongola-1
Finnish
S_Finnish-3
0
0.77319
139
S_Mongola-1
Adygei
S_Adygei-2
0
0.77319
140
S_Mongola-1
Kalash
S_Kalash-2
0
0.77319
141
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-2
0
0.77319
142
S_Mongola-1
Chechen
S_Chechen-1
0
0.77312
143
S_Mongola-1
Papuan
S_Papuan-9
0
0.77307
144
S_Mongola-1
Russian
S_Russian-2
0
0.77288
145
S_Mongola-1
Icelandic
S_Icelandic-1
0
0.77260
146
S_Mongola-1
Finnish
S_Finnish-2
0
0.77258
147
S_Mongola-1
Papuan
S_Papuan-12
0
0.77257
148
S_Mongola-1
Kalash
S_Kalash-1
0
0.77247
149
S_Mongola-1
Lezgin
S_Lezgin-1
0
0.77245
150
S_Mongola-1
Papuan
S_Papuan-8
0
0.77232
151
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-1
0
0.77197
152
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-1
0
0.77194
153
S_Mongola-1
Brahui
S_Brahui-2
0
0.77178
154
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-2
0
0.77170
155
S_Mongola-1
Papuan
S_Papuan-1
0
0.77164
156
S_Mongola-1
Norwegian
S_Norwegian-1
0
0.77159
157
S_Mongola-1
Orcadian
S_Orcadian-2
0
0.77158
158
S_Mongola-1
Estonian
S_Estonian-1
0
0.77155
159
S_Mongola-1
Papuan
S_Papuan-7
0
0.77150
160
S_Mongola-1
Papuan
S_Papuan-11
0
0.77146
161
S_Mongola-1
Estonian
S_Estonian-2
0
0.77144
162
S_Mongola-1
Papuan
S_Papuan-13
0
0.77131
163
S_Mongola-1
Tajik
S_Tajik-1
0
0.77131
164
S_Mongola-1
Papuan
S_Papuan-14
0
0.77129
165
S_Mongola-1
Hungarian
S_Hungarian-2
0
0.77120
166
S_Mongola-1
Czech
S_Czech-2
0
0.77120
167
S_Mongola-1
Papuan
S_Papuan-3
0
0.77119
168
S_Mongola-1
Icelandic
S_Icelandic-2
0
0.77119
169
S_Mongola-1
Hungarian
S_Hungarian-1
0
0.77111
170
S_Mongola-1
Polish
S_Polish-1
0
0.77110
171
S_Mongola-1
Bulgarian
S_Bulgarian-1
0
0.77106
172
S_Mongola-1
Greek
S_Greek-1
0
0.77103
173
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-2
0
0.77103
174
S_Mongola-1
Papuan
S_Papuan-5
0
0.77101
175
S_Mongola-1
French
S_French-2
0
0.77082
176
S_Mongola-1
Georgian
S_Georgian-1
0
0.77071
177
S_Mongola-1
Balochi
S_Balochi-1
0
0.77062
178
S_Mongola-1
Spanish
S_Spanish-1
0
0.77061
179
S_Mongola-1
Armenian
S_Armenian-1
0
0.77054
180
S_Mongola-1
Papuan
S_Papuan-6
0
0.77049
181
S_Mongola-1
Bergamo
S_Bergamo-2
0
0.77017
182
S_Mongola-1
Papuan
S_Papuan-2
0
0.77008
183
S_Mongola-1
Bulgarian
S_Bulgarian-2
0
0.77007
184
S_Mongola-1
Papuan
S_Papuan-4
0
0.77005
185
S_Mongola-1
Spanish
S_Spanish-2
0
0.76981
186
S_Mongola-1
Greek
S_Greek-2
0
0.76981
187
S_Mongola-1
Basque
S_Basque-1
0
0.76979
188
S_Mongola-1
English
S_English-1
0
0.76977
189
S_Mongola-1
Lezgin
S_Lezgin-2
0
0.76975
190
S_Mongola-1
Tuscan
S_Tuscan-2
0
0.76960
191
S_Mongola-1
Albanian.DG
S_Albanian1
0
0.76953
192
S_Mongola-1
English
S_English-2
0
0.76951
193
S_Mongola-1
Armenian
S_Armenian-2
0
0.76950
194
S_Mongola-1
Sardinian
S_Sardinian-2
0
0.76946
195
S_Mongola-1
Orcadian
S_Orcadian-1
0
0.76909
196
S_Mongola-1
Tuscan
S_Tuscan-1
0
0.76906
197
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-1
0
0.76901
198
S_Mongola-1
Basque
S_Basque-2
0
0.76888
199
S_Mongola-1
Georgian
S_Georgian-2
0
0.76886
200
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-2
0
0.76865
201
S_Mongola-1
Jordanian
S_Jordanian-3
0
0.76809
202
S_Mongola-1
French
S_French-1
0
0.76796
203
S_Mongola-1
BedouinB
S_BedouinB-2
0
0.76779
204
S_Mongola-1
Druze
S_Druze-1
0
0.76757
205
S_Mongola-1
Druze
S_Druze-2
0
0.76754
206
S_Mongola-1
Makrani
S_Makrani-2
0
0.76747
207
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-2
0
0.76622
208
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-1
0
0.76575
209
S_Mongola-1
Sardinian
S_Sardinian-1
0
0.76564
210
S_Mongola-1
BedouinB
S_BedouinB-1
0
0.76460
211
S_Mongola-1
Jordanian
S_Jordanian-2
0
0.76413
212
S_Mongola-1
Samaritan
S_Samaritan-1
0
0.76396
213
S_Mongola-1
Jordanian
S_Jordanian-1
0
0.76261
214
S_Mongola-1
Saharawi
S_Saharawi-2
0
0.75981
215
S_Mongola-1
Saharawi
S_Saharawi-1
0
0.75964
216
S_Mongola-1
Mozabite
S_Mozabite-1
0
0.75937
217
S_Mongola-1
Mozabite
S_Mozabite-2
0
0.75824
222
S_Mongola-1
Somali
S_Somali-1
0
0.74788
224
S_Mongola-1
Masai
S_Masai-2
0
0.74381
226
S_Mongola-1
Masai
S_Masai-1
0
0.74274
232
S_Mongola-1
Gambian
S_Gambian-2
0
0.73200
233
S_Mongola-1
BantuKenya
S_BantuKenya-1
0
0.73139
234
S_Mongola-1
Luo
S_Luo-2
0
0.73107
235
S_Mongola-1
BantuKenya
S_BantuKenya-2
0
0.73020
236
S_Mongola-1
Luhya
S_Luhya-1
0
0.73005
237
S_Mongola-1
Luhya
S_Luhya-2
0
0.73002
238
S_Mongola-1
Mandenka
S_Mandenka-2
0
0.72934
239
S_Mongola-1
Gambian
S_Gambian-1
0
0.72933
240
S_Mongola-1
Esan
S_Esan-2
0
0.72920
241
S_Mongola-1
Yoruba
S_Yoruba-2
0
0.72879
242
S_Mongola-1
Mandenka
S_Mandenka-1
0
0.72872
243
S_Mongola-1
Yoruba
S_Yoruba-1
0
0.72816
244
S_Mongola-1
Esan
S_Esan-1
0
0.72810
245
S_Mongola-1
Mende
S_Mende-1
0
0.72793
246
S_Mongola-1
Mende
S_Mende-2
0
0.72788
247
S_Mongola-1
Biaka
S_Biaka-1
0
0.72484
248
S_Mongola-1
Biaka
S_Biaka-2
0
0.72347
249
S_Mongola-1
Mbuti
S_Mbuti-3
0
0.72046
250
S_Mongola-1
Mbuti
S_Mbuti-1
0
0.72010
251
S_Mongola-1
Mbuti
S_Mbuti-2
0
0.72005
252
S_Mongola-1
Khomani_San
S_Khomani_San-2
0
0.71521
253
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-2
0
0.71514
254
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-3
0
0.71460
255
S_Mongola-1
Khomani_San
S_Khomani_San-1
0
0.71302
</tbody>
Very interesting. How much East Eurasian ancestry for the Saamis can we infer from this IBS comparison?
Zanzibar
03-09-2021, 02:11 PM
I tried doing qpAdm models of the population named Saami.DG in the v44.3_HO dataset. I excluded models with one or more negative weight (where feasible is false) and I sorted the models by their p score.
I'm probably doing something wrong, and I still don't know how to pick the outgroups. I mostly just picked outgroups that resulted in little decrease in the number of SNPs that remained after filtering. I also tried to pick left populations that resulted in little decrease in the SNP count.
I got 374794 out of 597573 SNPs after filtering, out of which 349558 were polymorphic.
https://i.ibb.co/vcTCKNz/b.png
In the image above, the models whose p score is above .05 have a constant of about 30-35% Nganasan ancestry. However EHG and CHG and SHG are also part Mongoloid. So if we consider Nganasan to be fully Mongoloid, Saami might also be closer to 40% than 30% Mongoloid.
Both individuals in the population Saami.DG were from Utsjoki, which is part of the Northern Saami region within Finland:
$ awk 'NR==1||/Saami...DG/' g/v44.3_HO_public/v44.3_HO_public.anno|cut -f2,4,9,10|tr \\t \;
Version ID;Publication (or OK to use in a paper);Locality;Country
S_Saami-1.DG;MallickNature2016;Utsjoki;Finland
S_Saami-2.DG;MallickNature2016;Utsjoki;Finland
Among Finnish Saami, there are an estimated 2,000 speakers of Northern Saami, 300 speakers of Inari Saami, and 300 speakers of Skolt Saami (https://fi.wikipedia.org/wiki/Saamelaiskielet). Out of four groups of Saami measured by Karin Mark, Skolt Saami had the lightest pigmentation, followed by Inari Saami, Finnish Northern Saami, and Kola Saami (https://www.etis.ee/Portal/Publications/Display/1fd319c0-7408-4e31-9f18-b9b3010eabad).
Scandinavian Northern Saami might be even more Mongoloid than Finnish Northern Saami, or at least Coon wrote that the Saami of the Scandinavian inland were the darkest and most brachycephalic (https://www.theapricity.com/snpa/chapter-IX2.htm):
The selected "pure" groups, Bryn's Reindeer Lapps, and some of Geyer's mountain and forest Lapps from Sweden, have seventy per cent or over of this dark hair, while the fairest Lapps, with a majority of brown and blond shades, are found in Finland and in the Kola Peninsula.
Pure dark eyes are found among one-third of Reindeer Lapps, and among as few as eight per cent in the total of Lapps from Norway.[14] Pure light and light-mixed eyes are commonest among the Lapps of Finland, where they total between thirty and forty per cent, and least common among the Reindeer Lapps of interior Norway and Sweden. Even among the purest selected sub-groups, such as that of Geyer, who isolated from a larger Swedish Lapp sample a few individuals of most pronounced Lappish type, at least a third are light or light-mixed in iris color. [...]
There are, however, regional differences; the center of extreme round headedness lies among the inland groups in northern Norway, while the Swedish, Finnish, and Kola Peninsula Lapps become progressively narrower headed. The mean for the purest Reindeer Lapps of Norway is 87; for the easternmost Lapps, 80 to 83.
Code for ADMIXTOOLS 2:
target="Saami.DG"
left=c("Turkey_Boncuklu_N.SG","Armenia_Caucasus_KuraAraxes","Latvia_HG","Sweden_Motala_HG","Russia_HG_Karelia","Russia_HG_Tyumen","Nganasan")
right=c("Mbuti.DG","Mixe.DG","Ami.DG","Papuan.DG","Chimp.REF","Ju_hoan_North","Biaka.DG","Yoruba.DG","Altai_Neanderthal.DG")
pops=c(left,right,target)
unlink("/tmp/f2",recursive=T)
extract_f2(pref="g/v44.3_HO_public/v44.3_HO_public",pops=pops,outdir="/tmp/f2")
f2=f2_from_precomp("/tmp/f2")
qp=qpadm(f2,left=left,right=right,target=target)
qp2=qp$popdrop%>%dplyr::filter(feasible==T&f4rank!=0)%>%arrange(desc(p))%>%dplyr::select(!c(wt,dof,chisq,f4rank,dofdiff,chis qdiff,p_nested,feasible,best,dofdiff,chisqdiff,p_n ested))
write_csv(qp2,"/tmp/qp")
Code to generate the bar chart:
library(tidyverse)
library(reshape2)
library(colorspace)
t=read_csv("/tmp/qp")
# t=t[t$p>.05,]
pvalue=sub("^0","",sprintf("%.3f",t$p))
t=t[-2]
t2=melt(t,id.var="pat")
ggplot(t2,aes(x=fct_rev(factor(pat,level=t$pat)),y =value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=round(100*value)),position=pos ition_stack(vjust=.5,reverse=T),size=3.5)+
coord_flip()+
theme(
axis.text.x=element_blank(),
axis.text=element_text(color="black"),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
legend.box.just="center",
legend.box.margin=margin(0),
legend.box.spacing=unit(.05,"in"),
legend.direction="vertical",
legend.justification="center",
legend.margin=margin(0),
legend.text=element_text(size=12),
legend.title=element_blank(),
panel.border=element_blank(),
text=element_text(size=16)
)+
xlab("")+
scale_x_discrete(labels=rev(pvalue),expand=c(0,0)) +
scale_y_discrete(expand=c(0,0))+
scale_fill_manual("legend",values=hex(HSV(c(45,45,210,210,120,120,300),c(.6, .6,.6,.6,.6,.6,.6),c(1,.6,1,.6,1,.6,1))))
ggsave("/tmp/a.png",width=7,height=7)
Very interesting is Saami.DG the same as the Saami samples in G25? Can you try to model the Saami and Mari and see how much EEF they have now using qpAdm if you can?
<tbody>
NO
FID1
FID2
IID2
PI_HAT
IBS
1
S_Mongola-1
Korean
S_Korean-1
0.157
0.81068
2
S_Mongola-1
Han
S_Han-1
0.1538
0.81020
3
S_Mongola-1
Japanese
S_Japanese-1
0.1603
0.80999
4
S_Mongola-1
Xibo
S_Xibo-2
0.1463
0.80968
5
S_Mongola-1
Korean
S_Korean-2
0.1546
0.80955
6
S_Mongola-1
Han
S_Han-2
0.1562
0.80910
7
S_Mongola-1
Tujia
S_Tujia-2
0.1522
0.80896
8
S_Mongola-1
Japanese
S_Japanese-2
0.148
0.80880
9
S_Mongola-1
She
S_She-1
0.1542
0.80875
10
S_Mongola-1
She
S_She-2
0.1535
0.80870
11
S_Mongola-1
Naxi
S_Naxi-1
0.1527
0.80869
12
S_Mongola-1
Japanese
S_Japanese-3
0.1426
0.80865
13
S_Mongola-1
Hezhen
S_Hezhen-2
0.1438
0.80863
14
S_Mongola-1
Yi
S_Yi-1
0.1494
0.80853
15
S_Mongola-1
Xibo
S_Xibo-1
0.1408
0.80837
16
S_Mongola-1
Miao
S_Miao-2
0.1534
0.80827
17
S_Mongola-1
Kinh
S_Kinh-1
0.1488
0.80800
18
S_Mongola-1
Naxi
S_Naxi-3
0.1516
0.80795
19
S_Mongola-1
Hezhen
S_Hezhen-1
0.1514
0.80782
20
S_Mongola-1
Tujia
S_Tujia-1
0.1519
0.80772
21
S_Mongola-1
Mongola
S_Mongola-2
0.1456
0.80755
22
S_Mongola-1
Miao
S_Miao-1
0.1518
0.80748
23
S_Mongola-1
Ulchi
S_Ulchi-1
0.1642
0.80746
24
S_Mongola-1
Oroqen
S_Oroqen-1
0.1575
0.80745
25
S_Mongola-1
Yi
S_Yi-2
0.1529
0.80724
26
S_Mongola-1
Daur
S_Daur-2
0.1422
0.80716
27
S_Mongola-1
Ulchi
S_Ulchi-2
0.1566
0.80713
28
S_Mongola-1
Oroqen
S_Oroqen-2
0.1588
0.80693
29
S_Mongola-1
Dai
S_Dai-1
0.1463
0.80672
30
S_Mongola-1
Even
S_Even-3
0.1583
0.80661
31
S_Mongola-1
Dai
S_Dai-2
0.1519
0.80603
32
S_Mongola-1
Tu
S_Tu-2
0.1387
0.80580
33
S_Mongola-1
Kinh
S_Kinh-2
0.1415
0.80574
34
S_Mongola-1
Thai
S_Thai-2
0.1401
0.80573
35
S_Mongola-1
China_Lahu
S_Lahu-1
0.1524
0.80558
36
S_Mongola-1
Burmese
S_Burmese-1
0.1385
0.80540
37
S_Mongola-1
Tu
S_Tu-1
0.1354
0.80530
38
S_Mongola-1
Ami.DG
S_Ami1
0.1575
0.80503
39
S_Mongola-1
Ami.DG
S_Ami2
0.1595
0.80502
40
S_Mongola-1
Even
S_Even-2
0.1555
0.80488
41
S_Mongola-1
Yakut
S_Yakut-1
0.1485
0.80419
42
S_Mongola-1
China_Lahu
S_Lahu-2
0.1523
0.80397
43
S_Mongola-1
Igorot
S_Igorot-2
0
0.80313
44
S_Mongola-1
Dusun
S_Dusun-2
0
0.80309
45
S_Mongola-1
Dusun
S_Dusun-1
0
0.80308
46
S_Mongola-1
Thai
S_Thai-1
0.1275
0.80306
47
S_Mongola-1
Igorot
S_Igorot-1
0
0.80301
48
S_Mongola-1
Cambodian
S_Cambodian-1
0.1407
0.80241
49
S_Mongola-1
Even
S_Even-1
0.1214
0.80214
50
S_Mongola-1
Burmese
S_Burmese-2
0.1169
0.80213
51
S_Mongola-1
Yakut
S_Yakut-2
0.1438
0.80209
52
S_Mongola-1
Cambodian
S_Cambodian-2
0.134
0.80188
53
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki1
0
0.80124
54
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-1
0.1127
0.79908
55
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-2
0.1005
0.79815
56
S_Mongola-1
Itelmen
S_Itelman-1
0
0.79809
57
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan2
0
0.79789
58
S_Mongola-1
Eskimo_Chaplin.DG
S_Chaplin1
0
0.79770
59
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan1
0
0.79751
60
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki2
0
0.79749
61
S_Mongola-1
Kusunda
S_Kusunda-1
0.1132
0.79740
62
S_Mongola-1
Tubalar
S_Tubalar-2
0
0.79509
63
S_Mongola-1
Tubalar
S_Tubalar-1
0.1107
0.79490
64
S_Mongola-1
Chukchi
S_Chukchi-1
0.0841
0.79357
65
S_Mongola-1
Uyghur
S_Uygur-1
0.0898
0.79336
66
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec1
0
0.79282
67
S_Mongola-1
Mansi
S_Mansi-1
0
0.79238
68
S_Mongola-1
Hazara
S_Hazara-1
0
0.79204
69
S_Mongola-1
Pima
S_Pima-1
0
0.79198
70
S_Mongola-1
Uyghur
S_Uygur-2
0
0.79197
71
S_Mongola-1
Hazara
S_Hazara-2
0
0.79170
72
S_Mongola-1
Mayan
S_Mayan-2
0
0.79120
73
S_Mongola-1
Mixtec
S_Mixtec-1
0
0.79120
74
S_Mongola-1
Mixe
S_Mixe-2
0
0.79115
75
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec2
0
0.79101
76
S_Mongola-1
Mayan
S_Mayan-1
0
0.79087
77
S_Mongola-1
Quechua
S_Quechua-3
0
0.79075
78
S_Mongola-1
Mixe
S_Mixe-3
0
0.79044
79
S_Mongola-1
Piapoco
S_Piapoco-2
0
0.79029
80
S_Mongola-1
Quechua
S_Quechua-1
0
0.79023
81
S_Mongola-1
Quechua
S_Quechua-2
0
0.78995
82
S_Mongola-1
Pima
S_Pima-2
0
0.78978
83
S_Mongola-1
Mansi
S_Mansi-2
0
0.78962
84
S_Mongola-1
Khonda_Dora
S_Khonda_Dora-1
0
0.78847
85
S_Mongola-1
Tlingit
S_Tlingit-2
0
0.78816
86
S_Mongola-1
Mixtec
S_Mixtec-2
0
0.78811
87
S_Mongola-1
Maori
S_Maori-1
0.0542
0.78805
88
S_Mongola-1
Piapoco
S_Piapoco-1
0
0.78747
89
S_Mongola-1
Karitiana
S_Karitiana-2
0
0.78742
90
S_Mongola-1
Surui
S_Surui-1
0
0.78727
91
S_Mongola-1
Surui
S_Surui-2
0
0.78565
92
S_Mongola-1
Karitiana
S_Karitiana-1
0
0.78561
93
S_Mongola-1
Bengali
S_Bengali-1
0
0.78436
94
S_Mongola-1
Kusunda
S_Kusunda-2
0
0.78408
95
S_Mongola-1
Tlingit
S_Tlingit-1
0
0.78388
96
S_Mongola-1
Relli
S_Relli-1
0
0.78344
97
S_Mongola-1
Kapu
S_Kapu-2
0
0.78280
98
S_Mongola-1
Madiga
S_Madiga-1
0
0.78227
99
S_Mongola-1
Madiga
S_Madiga-2
0
0.78175
100
S_Mongola-1
Mala
S_Mala-3
0
0.78161
101
S_Mongola-1
Yadava
S_Yadava-1
0
0.78157
102
S_Mongola-1
Bengali
S_Bengali-2
0
0.78140
103
S_Mongola-1
Kapu
S_Kapu-1
0
0.78130
104
S_Mongola-1
Irula
S_Irula-2
0
0.78128
105
S_Mongola-1
Mala
S_Mala-2
0
0.78128
106
S_Mongola-1
Punjabi
S_Punjabi-1
0
0.78107
107
S_Mongola-1
Irula
S_Irula-1
0
0.78107
108
S_Mongola-1
Burusho
S_Burusho-2
0
0.78081
109
S_Mongola-1
Yadava
S_Yadava-2
0
0.78078
110
S_Mongola-1
Saami
S_Saami-1
0
0.78063
111
S_Mongola-1
Brahmin
S_Brahmin-2
0
0.78031
112
S_Mongola-1
Saami
S_Saami-2
0
0.78012
113
S_Mongola-1
Relli
S_Relli-2
0
0.77974
114
S_Mongola-1
Punjabi
S_Punjabi-3
0
0.77920
115
S_Mongola-1
Bougainville
S_Bougainville-1
0
0.77900
116
S_Mongola-1
Burusho
S_Burusho-1
0
0.77885
117
S_Mongola-1
Punjabi
S_Punjabi-2
0
0.77885
118
S_Mongola-1
Brahmin
S_Brahmin-1
0
0.77874
119
S_Mongola-1
Bougainville
S_Bougainville-2
0
0.77866
120
S_Mongola-1
Sindhi
S_Sindhi-2
0
0.77851
121
S_Mongola-1
Pathan
S_Pathan-1
0
0.77838
122
S_Mongola-1
Punjabi
S_Punjabi-4
0
0.77776
123
S_Mongola-1
Kurd-Iraq
WGS
0
0.77625
124
S_Mongola-1
Pathan
S_Pathan-2
0
0.77597
125
S_Mongola-1
Ossetian-North
S_Ossetian-1
0
0.77575
126
S_Mongola-1
Russian
S_Russian-1
0
0.77570
127
S_Mongola-1
Finnish
S_Finnish-1
0
0.77476
128
S_Mongola-1
Sindhi
S_Sindhi-1
0
0.77473
129
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-1
0
0.77463
130
S_Mongola-1
Tajik
S_Tajik-2
0
0.77448
131
S_Mongola-1
YANA_UP_WGS
Yana1
0
0.77422
132
S_Mongola-1
Ossetian-North
S_Ossetian-2
0
0.77413
133
S_Mongola-1
Papuan
S_Papuan-10
0
0.77381
134
S_Mongola-1
Balochi
S_Balochi-2
0
0.77365
135
S_Mongola-1
Brahui
S_Brahui-1
0
0.77363
136
S_Mongola-1
Adygei
S_Adygei-1
0
0.77334
137
S_Mongola-1
Makrani
S_Makrani-1
0
0.77334
138
S_Mongola-1
Finnish
S_Finnish-3
0
0.77319
139
S_Mongola-1
Adygei
S_Adygei-2
0
0.77319
140
S_Mongola-1
Kalash
S_Kalash-2
0
0.77319
141
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-2
0
0.77319
142
S_Mongola-1
Chechen
S_Chechen-1
0
0.77312
143
S_Mongola-1
Papuan
S_Papuan-9
0
0.77307
144
S_Mongola-1
Russian
S_Russian-2
0
0.77288
145
S_Mongola-1
Icelandic
S_Icelandic-1
0
0.77260
146
S_Mongola-1
Finnish
S_Finnish-2
0
0.77258
147
S_Mongola-1
Papuan
S_Papuan-12
0
0.77257
148
S_Mongola-1
Kalash
S_Kalash-1
0
0.77247
149
S_Mongola-1
Lezgin
S_Lezgin-1
0
0.77245
150
S_Mongola-1
Papuan
S_Papuan-8
0
0.77232
151
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-1
0
0.77197
152
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-1
0
0.77194
153
S_Mongola-1
Brahui
S_Brahui-2
0
0.77178
154
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-2
0
0.77170
155
S_Mongola-1
Papuan
S_Papuan-1
0
0.77164
156
S_Mongola-1
Norwegian
S_Norwegian-1
0
0.77159
157
S_Mongola-1
Orcadian
S_Orcadian-2
0
0.77158
158
S_Mongola-1
Estonian
S_Estonian-1
0
0.77155
159
S_Mongola-1
Papuan
S_Papuan-7
0
0.77150
160
S_Mongola-1
Papuan
S_Papuan-11
0
0.77146
161
S_Mongola-1
Estonian
S_Estonian-2
0
0.77144
162
S_Mongola-1
Papuan
S_Papuan-13
0
0.77131
163
S_Mongola-1
Tajik
S_Tajik-1
0
0.77131
164
S_Mongola-1
Papuan
S_Papuan-14
0
0.77129
165
S_Mongola-1
Hungarian
S_Hungarian-2
0
0.77120
166
S_Mongola-1
Czech
S_Czech-2
0
0.77120
167
S_Mongola-1
Papuan
S_Papuan-3
0
0.77119
168
S_Mongola-1
Icelandic
S_Icelandic-2
0
0.77119
169
S_Mongola-1
Hungarian
S_Hungarian-1
0
0.77111
170
S_Mongola-1
Polish
S_Polish-1
0
0.77110
171
S_Mongola-1
Bulgarian
S_Bulgarian-1
0
0.77106
172
S_Mongola-1
Greek
S_Greek-1
0
0.77103
173
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-2
0
0.77103
174
S_Mongola-1
Papuan
S_Papuan-5
0
0.77101
175
S_Mongola-1
French
S_French-2
0
0.77082
176
S_Mongola-1
Georgian
S_Georgian-1
0
0.77071
177
S_Mongola-1
Balochi
S_Balochi-1
0
0.77062
178
S_Mongola-1
Spanish
S_Spanish-1
0
0.77061
179
S_Mongola-1
Armenian
S_Armenian-1
0
0.77054
180
S_Mongola-1
Papuan
S_Papuan-6
0
0.77049
181
S_Mongola-1
Bergamo
S_Bergamo-2
0
0.77017
182
S_Mongola-1
Papuan
S_Papuan-2
0
0.77008
183
S_Mongola-1
Bulgarian
S_Bulgarian-2
0
0.77007
184
S_Mongola-1
Papuan
S_Papuan-4
0
0.77005
185
S_Mongola-1
Spanish
S_Spanish-2
0
0.76981
186
S_Mongola-1
Greek
S_Greek-2
0
0.76981
187
S_Mongola-1
Basque
S_Basque-1
0
0.76979
188
S_Mongola-1
English
S_English-1
0
0.76977
189
S_Mongola-1
Lezgin
S_Lezgin-2
0
0.76975
190
S_Mongola-1
Tuscan
S_Tuscan-2
0
0.76960
191
S_Mongola-1
Albanian.DG
S_Albanian1
0
0.76953
192
S_Mongola-1
English
S_English-2
0
0.76951
193
S_Mongola-1
Armenian
S_Armenian-2
0
0.76950
194
S_Mongola-1
Sardinian
S_Sardinian-2
0
0.76946
195
S_Mongola-1
Orcadian
S_Orcadian-1
0
0.76909
196
S_Mongola-1
Tuscan
S_Tuscan-1
0
0.76906
197
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-1
0
0.76901
198
S_Mongola-1
Basque
S_Basque-2
0
0.76888
199
S_Mongola-1
Georgian
S_Georgian-2
0
0.76886
200
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-2
0
0.76865
201
S_Mongola-1
Jordanian
S_Jordanian-3
0
0.76809
202
S_Mongola-1
French
S_French-1
0
0.76796
203
S_Mongola-1
BedouinB
S_BedouinB-2
0
0.76779
204
S_Mongola-1
Druze
S_Druze-1
0
0.76757
205
S_Mongola-1
Druze
S_Druze-2
0
0.76754
206
S_Mongola-1
Makrani
S_Makrani-2
0
0.76747
207
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-2
0
0.76622
208
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-1
0
0.76575
209
S_Mongola-1
Sardinian
S_Sardinian-1
0
0.76564
210
S_Mongola-1
BedouinB
S_BedouinB-1
0
0.76460
211
S_Mongola-1
Jordanian
S_Jordanian-2
0
0.76413
212
S_Mongola-1
Samaritan
S_Samaritan-1
0
0.76396
213
S_Mongola-1
Jordanian
S_Jordanian-1
0
0.76261
214
S_Mongola-1
Saharawi
S_Saharawi-2
0
0.75981
215
S_Mongola-1
Saharawi
S_Saharawi-1
0
0.75964
216
S_Mongola-1
Mozabite
S_Mozabite-1
0
0.75937
217
S_Mongola-1
Mozabite
S_Mozabite-2
0
0.75824
222
S_Mongola-1
Somali
S_Somali-1
0
0.74788
224
S_Mongola-1
Masai
S_Masai-2
0
0.74381
226
S_Mongola-1
Masai
S_Masai-1
0
0.74274
232
S_Mongola-1
Gambian
S_Gambian-2
0
0.73200
233
S_Mongola-1
BantuKenya
S_BantuKenya-1
0
0.73139
234
S_Mongola-1
Luo
S_Luo-2
0
0.73107
235
S_Mongola-1
BantuKenya
S_BantuKenya-2
0
0.73020
236
S_Mongola-1
Luhya
S_Luhya-1
0
0.73005
237
S_Mongola-1
Luhya
S_Luhya-2
0
0.73002
238
S_Mongola-1
Mandenka
S_Mandenka-2
0
0.72934
239
S_Mongola-1
Gambian
S_Gambian-1
0
0.72933
240
S_Mongola-1
Esan
S_Esan-2
0
0.72920
241
S_Mongola-1
Yoruba
S_Yoruba-2
0
0.72879
242
S_Mongola-1
Mandenka
S_Mandenka-1
0
0.72872
243
S_Mongola-1
Yoruba
S_Yoruba-1
0
0.72816
244
S_Mongola-1
Esan
S_Esan-1
0
0.72810
245
S_Mongola-1
Mende
S_Mende-1
0
0.72793
246
S_Mongola-1
Mende
S_Mende-2
0
0.72788
247
S_Mongola-1
Biaka
S_Biaka-1
0
0.72484
248
S_Mongola-1
Biaka
S_Biaka-2
0
0.72347
249
S_Mongola-1
Mbuti
S_Mbuti-3
0
0.72046
250
S_Mongola-1
Mbuti
S_Mbuti-1
0
0.72010
251
S_Mongola-1
Mbuti
S_Mbuti-2
0
0.72005
252
S_Mongola-1
Khomani_San
S_Khomani_San-2
0
0.71521
253
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-2
0
0.71514
254
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-3
0
0.71460
255
S_Mongola-1
Khomani_San
S_Khomani_San-1
0
0.71302
</tbody>
It's pretty amazing that the above IBS list was able to properly order Mbuti, Khomani, and Ju-Hoan in terms of IBS with Mongola.
Does anyone know why Mongola is slightly closer to Mbuti than Khomani and Ju-Hoan ?
Hint: The late paleolithic African paper :)
PROOF G25 distances shouldn't be trusted
The late Paleolithic African paper showed that there was Eurasian geneflow back to Africa in the Paleolithic that affected pretty much all Africans including Mbuti. In other words even Mbuti got some Eurasian genes during the Paleolithic. Least affected were Khomani and Ju-Hoan.
The IBS list I posted accurately shows this by showing Mongola closer to Mbuti than to Khomani and Ju-Hoan.
The G25 (scaled) on the other hand gets it all wrong. You can try it yourself. It wrongly shows Mongola significantly closer to Khomani-San than Mbuti ! If it gets this wrong then how should the pops be trusted.
Distance to: Mongola
0.918673 Khomani_San
0.98425066 Ju_hoan_North
0.99607508 Mbuti
From the African paper showing that the IBS list is correct and G25 is wrong. BTW I just checked distances from Mongola to Kurds vs Chechens vs Balochis vs Iranians are also screwed up in G25
In addition, we find that the Mbuti and Biaka, both Central African hunter-gatherer populations, show levels
of Eurasian gene flow that are intermediate between levels observed in the Khoe-San and Yorubans (Fig. 1a,b,
Supplemental Table S1)
https://i.imgur.com/VvE3LBX.jpg
Mbuti has more Eurasian admixture than Khomani and Ju-Hoan
https://i.imgur.com/NtvoN9W.jpg
https://i.imgur.com/OPF1NxR.jpg
https://i.imgur.com/ZaUlzUP.jpg
I wonder why the Mari have like 3-3.5 times more East Eurasian than the Mordovians (~10% vs ~30%). Those two republics are not even that far away from each other.
Komintasavalta
03-09-2021, 09:20 PM
PROOF G25 distances shouldn't be trusted
The late Paleolithic African paper showed that there was Eurasian geneflow back to Africa in the Paleolithic that affected pretty much all Africans including Mbuti. In other words even Mbuti got some Eurasian genes during the Paleolithic. Least affected were Khomani and Ju-Hoan.
The IBS list I posted accurately shows this by showing Mongola closer to Mbuti than to Khomani and Ju-Hoan.
The G25 (scaled) on the other hand gets it all wrong. You can try it yourself. It wrongly shows Mongola significantly closer to Khomani-San than Mbuti ! If it gets this wrong then how should the pops be trusted.
Distance to: Mongola
0.918673 Khomani_San
0.98425066 Ju_hoan_North
0.99607508 Mbuti
Based on FST distances from 1240K, Mongola were also further from San than from Mbuti:
> fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Ju_hoan_North.DG","Khomani_San.DG","Mbuti.DG","Mongola.DG"))
> f2m=function(x){t=as.data.frame(x)[,1:3];t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])}
> r=sort(f2m(fst)["Mongola.DG",]);cat(paste(sprintf("%.4f",r),names(r)),sep="\n")
0.0000 Mongola.DG
0.2014 Biaka.DG
0.2309 Mbuti.DG
0.2482 Ju_hoan_North.DG
0.2571 Khomani_San.DG
The f2m function converts f2 or FST pairs to a square matrix.
However based on unscaled G25 distances, Mongola are closer to Mbuti than to Ju_hoan_North:
$ mkdir -p g/25;printf %s\\n ai\ 1UrhcfNMLW0oMXIbHGUE60v2taCM7PFw1 aa\ 1F2rKEVtu8nWSm7qFhxPU6UESQNsmA-sl mi\ 1HYrDwxEXv82DvDLoq736pS5ZTGJA4dn5 ma\ 1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y aiu\ 1YKkEOtyV5SISvmY_FyS4YSLXCxxYt5_W aau\ 1f0imQyVNZ9RPESNAYIeIkA8fx4wAVNYo miu\ 18GcEVEl3GI-ByviD-TgQQjvEaaTbNTr2 mau\ 1y49hyvviJpHj9esVqyeiFm32DhnPlfRQ|while read l m;do curl "drive.google.com/uc?export=download&id=$m" -Lso g/25/$l;done
$ dist(){ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}$1{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5,$1}' "$2" "$1"|sort -n|awk '{printf"%.3f %s\n",$1,$2}'|sed s,^0,,;}
$ dist g/25/mau <(grep Mongola g/25/mau)|tail
.225 Piapoco
.228 Wichi
.234 Kosipe
.236 Karitiana
.241 Koinanbe
.241 Papuan
.241 Surui
.274 Khomani_San
.281 Mbuti
.388 Ju_hoan_North
$ dist g/25/ma <(grep Mongola g/25/ma)|tail
.831 Igbo
.831 Yoruba
.835 Esan_Nigeria
.859 Bedzan
.864 Bakola
.867 Baka
.881 Biaka
.919 Khomani_San
.984 Ju_hoan_North
.996 Mbuti
Komintasavalta
03-09-2021, 09:45 PM
11- Drop Nganasan from sources and add Shamanka-EN instead. It's always better to keep it cosistently Ancients. Shamanka would be more ancestral to Uralics than Nganasan.
A qpAdm model in Jeong et al. 2019 ("The genetic history of admixture across inner Eurasia") used Nganasan as a source to model Uralics: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6542712/figure/F5/.
7- I would add Iran-N to pright
When I added it, KuraAraxes became the main component in my models for Saami.DG:
https://i.ibb.co/WH9M5Nj/saami.png
But I guess if I don't go with the strategy where I just pick niggers and chimps as outgroups, then I'm supposed to select an outgroup related to each left population.
Anyway, I tried implementing your suggestions I used these outgroups: Belgium_UP_GoyetQ116_1_published_all, Iran_GanjDareh_N, Morocco_Iberomaurusian, Russia_HG_Tyumen, Russia_Kolyma_M.SG, Russia_Kostenki14, Switzerland_Bichon.SG. I don't know how to pick just a single individual from a population as an outgroup.
I now got "597573 SNPs read in total" with "244600 SNPs remain after filtering. 190241 are polymorphic."
The proportion of the Mongoloid ancestry dropped, but maybe I should've used kra001 instead of Shamanka and Devil's Cave as a Mongoloid source.
https://i.ibb.co/WWnM3WT/zoro.png
On the last two rows of the image above, Saami are modeled as 48% Latvia_HG and 52% Shamanka/DevilsCave, which doesn't seem right. When I added two additional outgroups that were related to Latvia_HG (Norway_N_HG.SG) and to Devil's Cave (Russia_MN_Boisman), the Mongoloid ancestry dropped to about 30-35% in the two-way models of Latvia_HG/Motala + Shamanka/DevilsCave. It increased the p values of the first models, but it also reduced their proportion of Mongoloid ancestry.
https://i.ibb.co/7VmtKQS/zoro2.png
BTW I didn't realize earlier that the anno files have a column for SNP counts:
$ cut -f2,13,21 g/v44.3_1240K_public/v44.3_1240K_public.anno|awk 'NR==1||/Saami/'|tr \\t \;
Version ID;GroupID;SNPs hit on autosomal targets
S_Saami-1.DG;Saami.DG;1119750
S_Saami-2.DG;Saami.DG;1120268
Saami.SG;Finland_Saami_Modern.SG;1128484
DA237.SG;Finland_Saami_IA.SG;110265
$ cut -f2,8,15 g/v44.3_HO_public/v44.3_HO_public.anno|awk 'NR==1||/Saami/'|tr \\t \;
Version ID;Group Label;SNPs hit on autosomal targets
SD60_297;Saami.WGA;588596
S_Saami-1.DG;Saami.DG;584416
S_Saami-2.DG;Saami.DG;584732
Saami.SG;Finland_Saami_Modern.SG;569246
DA237.SG;Finland_Saami_IA.SG;59111
travv
03-09-2021, 10:22 PM
I wonder why the Mari have like 3-3.5 times more East Eurasian than the Mordovians (~10% vs ~30%). Those two republics are not even that far away from each other.
Because Mordovians are semi-wogs like Slavs, Balts, Hungarians, Germanics and other Europeans.
I doubt however that Mordovians has 10% East Eurasian. Too high for them.
Komintasavalta
03-09-2021, 11:53 PM
I wonder why the Mari have like 3-3.5 times more East Eurasian than the Mordovians (~10% vs ~30%). Those two republics are not even that far away from each other.
Even a few degrees of latitude matters.
Based on YHG, Mordvins cluster together with Tatars and Central and Southern Russians, and they have little N by Finno-Permic standards.
https://i.ibb.co/fYj0CSQ/tambets-yhg.png
library(pheatmap)
library(tidyverse)
library(colorspace) # for hex
library(vegan) # for reorder.hclust
download.file("https://pastebin.com/raw/jFpVY4Wv","tambetsyhg")
t=read.csv("tambetsyhg",header=T,row.names=1,check.names=F)
pop=c("Bashkirs","Chuvashes","Enets","Estonians","Finns","Hungarians","Karelians","Khanty","Komis","Latvians","Lithuanians","Mansis","Maris","Mordovians","Nenets","Nganasans","Russians Central","Russians North","Russians South","Saami from Kola Peninsula","Saami from Sweden","Selkups","Swedes","Tatars","Udmurts","Vepsians")
t=t[pop,]
t=t%>%select_if(colSums(.)>=4)
weight=rowSums(t[,c("N(xN3)1# (M231)","N32# (TAT/M178)","C3 (M217)","P+Q+R*+R2 (M74/M242/M207/M124)")])
sort=reorder(hclust(dist(t)),wts=weight)
pheatmap(
t,
filename="output.png",
clustering_callback=function(...){c(sort)},
cluster_cols=F,
legend=F,
treeheight_row=80,
cellwidth=16,
cellheight=16,
fontsize=8,
border_color=NA,
display_numbers=T,
number_format="%.0f",
fontsize_number=7,
number_color="black",
colorRampPalette(hex(HSV(c(210,210,160,120,60,40,2 0,0,0),c(0,.5,.5,.5,.5,.5,.5,.5,.5),c(1,1,1,1,1,1, 1,1,.7))))(256)
)
Zanzibar
03-10-2021, 01:47 AM
A qpAdm model in Jeong et al. 2019 ("The genetic history of admixture across inner Eurasia") used Nganasan as a source to model Uralics: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6542712/figure/F5/.
When I added it, KuraAraxes became the main component in my models for Saami.DG:
https://i.ibb.co/WH9M5Nj/saami.png
But I guess if I don't go with the strategy where I just pick niggers and chimps as outgroups, then I'm supposed to select an outgroup related to each left population.
Anyway, I tried implementing your suggestions I used these outgroups: Belgium_UP_GoyetQ116_1_published_all, Iran_GanjDareh_N, Morocco_Iberomaurusian, Russia_HG_Tyumen, Russia_Kolyma_M.SG, Russia_Kostenki14, Switzerland_Bichon.SG. I don't know how to pick just a single individual from a population as an outgroup.
I now got "597573 SNPs read in total" with "244600 SNPs remain after filtering. 190241 are polymorphic."
The proportion of the Mongoloid ancestry dropped, but maybe I should've used kra001 instead of Shamanka and Devil's Cave as a Mongoloid source.
https://i.ibb.co/WWnM3WT/zoro.png
On the last two rows of the image above, Saami are modeled as 48% Latvia_HG and 52% Shamanka/DevilsCave, which doesn't seem right. When I added two additional outgroups that were related to Latvia_HG (Norway_N_HG.SG) and to Devil's Cave (Russia_MN_Boisman), the Mongoloid ancestry dropped to about 30-35% in the two-way models of Latvia_HG/Motala + Shamanka/DevilsCave. It increased the p values of the first models, but it also reduced their proportion of Mongoloid ancestry.
https://i.ibb.co/7VmtKQS/zoro2.png
BTW I didn't realize earlier that the anno files have a column for SNP counts:
$ cut -f2,13,21 g/v44.3_1240K_public/v44.3_1240K_public.anno|awk 'NR==1||/Saami/'|tr \\t \;
Version ID;GroupID;SNPs hit on autosomal targets
S_Saami-1.DG;Saami.DG;1119750
S_Saami-2.DG;Saami.DG;1120268
Saami.SG;Finland_Saami_Modern.SG;1128484
DA237.SG;Finland_Saami_IA.SG;110265
$ cut -f2,8,15 g/v44.3_HO_public/v44.3_HO_public.anno|awk 'NR==1||/Saami/'|tr \\t \;
Version ID;Group Label;SNPs hit on autosomal targets
SD60_297;Saami.WGA;588596
S_Saami-1.DG;Saami.DG;584416
S_Saami-2.DG;Saami.DG;584732
Saami.SG;Finland_Saami_Modern.SG;569246
DA237.SG;Finland_Saami_IA.SG;59111
Are these all individual samples? Several of them seem.to be majority EEF/Boncuklu-derived with slightly lower Mongoloid ancestry while.others completely lack the Anatolian contamination.
Yep you should also included kra001.You should also replaced Armenia_Kura_Araxes with Georgia_CHG and Iran_Wezmeh_N because Kura_Araxes also contains Anatolian wog admix which hide the actual amount of EEF wog impurity in Saamis. Also try replacing Boncuklu_N with Barcin_N to see if the EEF contamination level.will decrease.
Mingle
03-10-2021, 03:55 AM
PROOF G25 distances shouldn't be trusted
The late Paleolithic African paper showed that there was Eurasian geneflow back to Africa in the Paleolithic that affected pretty much all Africans including Mbuti. In other words even Mbuti got some Eurasian genes during the Paleolithic. Least affected were Khomani and Ju-Hoan.
The IBS list I posted accurately shows this by showing Mongola closer to Mbuti than to Khomani and Ju-Hoan.
The G25 (scaled) on the other hand gets it all wrong. You can try it yourself. It wrongly shows Mongola significantly closer to Khomani-San than Mbuti ! If it gets this wrong then how should the pops be trusted.
Distance to: Mongola
0.918673 Khomani_San
0.98425066 Ju_hoan_North
0.99607508 Mbuti
Can you link the paper?
Komintasavalta
03-10-2021, 08:09 AM
Are these all individual samples?
No, they're alternative models for a Saami population average that consists of two Northern Saami individuals from Finland. The models use different combinations of source populations and they are sorted by the p-score (which indicates how feasible the models are). I excluded models where one or more source population had a negative weight.
Actually you're probably not supposed to use qpAdm the way I did, but you're supposed to pick different outgroups for each model using something like the `qpadm_rotate` function (https://uqrmaie1.github.io/admixtools/articles/admixtools.html):
`qpadm_rotate()` tests many `qpadm()` models at a time. For each model, the `leftright` populations will be split into two groups: The first group will be the left populations passed to `qpadm()`, while the second group will be added to `rightfix` and become the set of right populations.
I used KuraAraxes because Georgia_Kotias.SG / KK1.SG (GEO_CHG on G25) gave me a lower SNP count. Vahaduo often gave me KuraAraxes as a churka source for Uralic models.
Anyway, I'm learning ADMIXTURE now. It seems easier than qpAdm because you don't have to pick outgroups.
I downloaded the tar file from here: https://www.gnxp.com/WordPress/2018/07/11/tutorial-to-run-pca-admixture-treemix-and-pairwise-fst-in-one-command/. I downloaded Mac binaries for ADMIXTURE and plink: http://dalexander.github.io/admixture/download.html, https://www.cog-genomics.org/plink/1.9/.
I picked populations from this list:
cut -d' ' -f1 ancestry/Est1000HGDP.fam|sort -u
Next I ran commands like this:
n=travvscale;k=3
plink --bfile ancestry/Est1000HGDP --keep <(printf %s\\n Armenians Bulgarians Ukranians Russian Komi Maris Nenet|awk 'NR==FNR{a[$0];next}$1 in a' - ancestry/Est1000HGDP.fam) --make-bed --out $n
admixture -j8 $n.bed $k
paste -d' ' <(cut -d' ' -f1,2 $n.fam) $n.$k.Q>$n.$k
Then I ran this in R:
library(tidyverse)
library(colorspace) # for hex()
t=read.table("~/travvscale.3",sep=" ",header=F)
t=t[,c(1,4,3,5)]
names(t)=seq(length(t))
ave=aggregate(t[,-1],list(t[,1]),mean)
ave=ave[order(ave[,4]-ave[,2]),]
ave2=pivot_longer(ave,cols=2:ncol(ave))
ggplot(ave2,aes(x=fct_rev(factor(Group.1,level=uni que(Group.1))),y=value,fill=name))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=round(100*value)),position=pos ition_stack(vjust=.5,reverse=T),size=3.5)+
coord_flip()+
theme(axis.text=element_text(color="black"),axis.ticks=element_blank(),axis.title.x=element_ blank(),legend.position="none",text=element_text(size=16))+
xlab("")+
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
scale_fill_manual("legend",values=hex(HSV(c(30,210,300),c(.5),c(1))))
ggsave("output.png",width=5,height=2.5)
Result:
https://i.ibb.co/K20Rgz5/travv-scale-admixture.png
Lucas
03-10-2021, 03:45 PM
Est1000HGDP.fam
You created merged dataset or you find it somewhere?
LorenzoSpitaleri
03-10-2021, 03:48 PM
I didn't know Udmurts had such high Mongoloid ancestry considering the predominance of red hair in them
Enviado desde mi SM-A107M mediante Tapatalk
Can you link the paper?
Here is the supp
https://www.biorxiv.org/content/biorxiv/early/2020/06/01/2020.06.01.127555/DC1/embed/media-1.pdf?download=true
Here is the paper
https://www.biorxiv.org/content/10.1101/2020.06.01.127555v1.full.pdf
<tbody>
NO
FID1
FID2
IID2
PI_HAT
IBS
1
S_Mongola-1
Korean
S_Korean-1
0.157
0.81068
2
S_Mongola-1
Han
S_Han-1
0.1538
0.81020
3
S_Mongola-1
Japanese
S_Japanese-1
0.1603
0.80999
4
S_Mongola-1
Xibo
S_Xibo-2
0.1463
0.80968
5
S_Mongola-1
Korean
S_Korean-2
0.1546
0.80955
6
S_Mongola-1
Han
S_Han-2
0.1562
0.80910
7
S_Mongola-1
Tujia
S_Tujia-2
0.1522
0.80896
8
S_Mongola-1
Japanese
S_Japanese-2
0.148
0.80880
9
S_Mongola-1
She
S_She-1
0.1542
0.80875
10
S_Mongola-1
She
S_She-2
0.1535
0.80870
11
S_Mongola-1
Naxi
S_Naxi-1
0.1527
0.80869
12
S_Mongola-1
Japanese
S_Japanese-3
0.1426
0.80865
13
S_Mongola-1
Hezhen
S_Hezhen-2
0.1438
0.80863
14
S_Mongola-1
Yi
S_Yi-1
0.1494
0.80853
15
S_Mongola-1
Xibo
S_Xibo-1
0.1408
0.80837
16
S_Mongola-1
Miao
S_Miao-2
0.1534
0.80827
17
S_Mongola-1
Kinh
S_Kinh-1
0.1488
0.80800
18
S_Mongola-1
Naxi
S_Naxi-3
0.1516
0.80795
19
S_Mongola-1
Hezhen
S_Hezhen-1
0.1514
0.80782
20
S_Mongola-1
Tujia
S_Tujia-1
0.1519
0.80772
21
S_Mongola-1
Mongola
S_Mongola-2
0.1456
0.80755
22
S_Mongola-1
Miao
S_Miao-1
0.1518
0.80748
23
S_Mongola-1
Ulchi
S_Ulchi-1
0.1642
0.80746
24
S_Mongola-1
Oroqen
S_Oroqen-1
0.1575
0.80745
25
S_Mongola-1
Yi
S_Yi-2
0.1529
0.80724
26
S_Mongola-1
Daur
S_Daur-2
0.1422
0.80716
27
S_Mongola-1
Ulchi
S_Ulchi-2
0.1566
0.80713
28
S_Mongola-1
Oroqen
S_Oroqen-2
0.1588
0.80693
29
S_Mongola-1
Dai
S_Dai-1
0.1463
0.80672
30
S_Mongola-1
Even
S_Even-3
0.1583
0.80661
31
S_Mongola-1
Dai
S_Dai-2
0.1519
0.80603
32
S_Mongola-1
Tu
S_Tu-2
0.1387
0.80580
33
S_Mongola-1
Kinh
S_Kinh-2
0.1415
0.80574
34
S_Mongola-1
Thai
S_Thai-2
0.1401
0.80573
35
S_Mongola-1
China_Lahu
S_Lahu-1
0.1524
0.80558
36
S_Mongola-1
Burmese
S_Burmese-1
0.1385
0.80540
37
S_Mongola-1
Tu
S_Tu-1
0.1354
0.80530
38
S_Mongola-1
Ami.DG
S_Ami1
0.1575
0.80503
39
S_Mongola-1
Ami.DG
S_Ami2
0.1595
0.80502
40
S_Mongola-1
Even
S_Even-2
0.1555
0.80488
41
S_Mongola-1
Yakut
S_Yakut-1
0.1485
0.80419
42
S_Mongola-1
China_Lahu
S_Lahu-2
0.1523
0.80397
43
S_Mongola-1
Igorot
S_Igorot-2
0
0.80313
44
S_Mongola-1
Dusun
S_Dusun-2
0
0.80309
45
S_Mongola-1
Dusun
S_Dusun-1
0
0.80308
46
S_Mongola-1
Thai
S_Thai-1
0.1275
0.80306
47
S_Mongola-1
Igorot
S_Igorot-1
0
0.80301
48
S_Mongola-1
Cambodian
S_Cambodian-1
0.1407
0.80241
49
S_Mongola-1
Even
S_Even-1
0.1214
0.80214
50
S_Mongola-1
Burmese
S_Burmese-2
0.1169
0.80213
51
S_Mongola-1
Yakut
S_Yakut-2
0.1438
0.80209
52
S_Mongola-1
Cambodian
S_Cambodian-2
0.134
0.80188
53
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki1
0
0.80124
54
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-1
0.1127
0.79908
55
S_Mongola-1
Kyrgyz_Kyrgyzstan
S_Kyrgyz-2
0.1005
0.79815
56
S_Mongola-1
Itelmen
S_Itelman-1
0
0.79809
57
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan2
0
0.79789
58
S_Mongola-1
Eskimo_Chaplin.DG
S_Chaplin1
0
0.79770
59
S_Mongola-1
Eskimo_Naukan.DG
S_Naukan1
0
0.79751
60
S_Mongola-1
Eskimo_Sireniki.DG
S_Sireniki2
0
0.79749
61
S_Mongola-1
Kusunda
S_Kusunda-1
0.1132
0.79740
62
S_Mongola-1
Tubalar
S_Tubalar-2
0
0.79509
63
S_Mongola-1
Tubalar
S_Tubalar-1
0.1107
0.79490
64
S_Mongola-1
Chukchi
S_Chukchi-1
0.0841
0.79357
65
S_Mongola-1
Uyghur
S_Uygur-1
0.0898
0.79336
66
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec1
0
0.79282
67
S_Mongola-1
Mansi
S_Mansi-1
0
0.79238
68
S_Mongola-1
Hazara
S_Hazara-1
0
0.79204
69
S_Mongola-1
Pima
S_Pima-1
0
0.79198
70
S_Mongola-1
Uyghur
S_Uygur-2
0
0.79197
71
S_Mongola-1
Hazara
S_Hazara-2
0
0.79170
72
S_Mongola-1
Mayan
S_Mayan-2
0
0.79120
73
S_Mongola-1
Mixtec
S_Mixtec-1
0
0.79120
74
S_Mongola-1
Mixe
S_Mixe-2
0
0.79115
75
S_Mongola-1
Mexico_Zapotec.DG
S_Zapotec2
0
0.79101
76
S_Mongola-1
Mayan
S_Mayan-1
0
0.79087
77
S_Mongola-1
Quechua
S_Quechua-3
0
0.79075
78
S_Mongola-1
Mixe
S_Mixe-3
0
0.79044
79
S_Mongola-1
Piapoco
S_Piapoco-2
0
0.79029
80
S_Mongola-1
Quechua
S_Quechua-1
0
0.79023
81
S_Mongola-1
Quechua
S_Quechua-2
0
0.78995
82
S_Mongola-1
Pima
S_Pima-2
0
0.78978
83
S_Mongola-1
Mansi
S_Mansi-2
0
0.78962
84
S_Mongola-1
Khonda_Dora
S_Khonda_Dora-1
0
0.78847
85
S_Mongola-1
Tlingit
S_Tlingit-2
0
0.78816
86
S_Mongola-1
Mixtec
S_Mixtec-2
0
0.78811
87
S_Mongola-1
Maori
S_Maori-1
0.0542
0.78805
88
S_Mongola-1
Piapoco
S_Piapoco-1
0
0.78747
89
S_Mongola-1
Karitiana
S_Karitiana-2
0
0.78742
90
S_Mongola-1
Surui
S_Surui-1
0
0.78727
91
S_Mongola-1
Surui
S_Surui-2
0
0.78565
92
S_Mongola-1
Karitiana
S_Karitiana-1
0
0.78561
93
S_Mongola-1
Bengali
S_Bengali-1
0
0.78436
94
S_Mongola-1
Kusunda
S_Kusunda-2
0
0.78408
95
S_Mongola-1
Tlingit
S_Tlingit-1
0
0.78388
96
S_Mongola-1
Relli
S_Relli-1
0
0.78344
97
S_Mongola-1
Kapu
S_Kapu-2
0
0.78280
98
S_Mongola-1
Madiga
S_Madiga-1
0
0.78227
99
S_Mongola-1
Madiga
S_Madiga-2
0
0.78175
100
S_Mongola-1
Mala
S_Mala-3
0
0.78161
101
S_Mongola-1
Yadava
S_Yadava-1
0
0.78157
102
S_Mongola-1
Bengali
S_Bengali-2
0
0.78140
103
S_Mongola-1
Kapu
S_Kapu-1
0
0.78130
104
S_Mongola-1
Irula
S_Irula-2
0
0.78128
105
S_Mongola-1
Mala
S_Mala-2
0
0.78128
106
S_Mongola-1
Punjabi
S_Punjabi-1
0
0.78107
107
S_Mongola-1
Irula
S_Irula-1
0
0.78107
108
S_Mongola-1
Burusho
S_Burusho-2
0
0.78081
109
S_Mongola-1
Yadava
S_Yadava-2
0
0.78078
110
S_Mongola-1
Saami
S_Saami-1
0
0.78063
111
S_Mongola-1
Brahmin
S_Brahmin-2
0
0.78031
112
S_Mongola-1
Saami
S_Saami-2
0
0.78012
113
S_Mongola-1
Relli
S_Relli-2
0
0.77974
114
S_Mongola-1
Punjabi
S_Punjabi-3
0
0.77920
115
S_Mongola-1
Bougainville
S_Bougainville-1
0
0.77900
116
S_Mongola-1
Burusho
S_Burusho-1
0
0.77885
117
S_Mongola-1
Punjabi
S_Punjabi-2
0
0.77885
118
S_Mongola-1
Brahmin
S_Brahmin-1
0
0.77874
119
S_Mongola-1
Bougainville
S_Bougainville-2
0
0.77866
120
S_Mongola-1
Sindhi
S_Sindhi-2
0
0.77851
121
S_Mongola-1
Pathan
S_Pathan-1
0
0.77838
122
S_Mongola-1
Punjabi
S_Punjabi-4
0
0.77776
123
S_Mongola-1
Kurd-Iraq
WGS
0
0.77625
124
S_Mongola-1
Pathan
S_Pathan-2
0
0.77597
125
S_Mongola-1
Ossetian-North
S_Ossetian-1
0
0.77575
126
S_Mongola-1
Russian
S_Russian-1
0
0.77570
127
S_Mongola-1
Finnish
S_Finnish-1
0
0.77476
128
S_Mongola-1
Sindhi
S_Sindhi-1
0
0.77473
129
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-1
0
0.77463
130
S_Mongola-1
Tajik
S_Tajik-2
0
0.77448
131
S_Mongola-1
YANA_UP_WGS
Yana1
0
0.77422
132
S_Mongola-1
Ossetian-North
S_Ossetian-2
0
0.77413
133
S_Mongola-1
Papuan
S_Papuan-10
0
0.77381
134
S_Mongola-1
Balochi
S_Balochi-2
0
0.77365
135
S_Mongola-1
Brahui
S_Brahui-1
0
0.77363
136
S_Mongola-1
Adygei
S_Adygei-1
0
0.77334
137
S_Mongola-1
Makrani
S_Makrani-1
0
0.77334
138
S_Mongola-1
Finnish
S_Finnish-3
0
0.77319
139
S_Mongola-1
Adygei
S_Adygei-2
0
0.77319
140
S_Mongola-1
Kalash
S_Kalash-2
0
0.77319
141
S_Mongola-1
Turkish-Kayseri
S_Turkish-Kayseri-2
0
0.77319
142
S_Mongola-1
Chechen
S_Chechen-1
0
0.77312
143
S_Mongola-1
Papuan
S_Papuan-9
0
0.77307
144
S_Mongola-1
Russian
S_Russian-2
0
0.77288
145
S_Mongola-1
Icelandic
S_Icelandic-1
0
0.77260
146
S_Mongola-1
Finnish
S_Finnish-2
0
0.77258
147
S_Mongola-1
Papuan
S_Papuan-12
0
0.77257
148
S_Mongola-1
Kalash
S_Kalash-1
0
0.77247
149
S_Mongola-1
Lezgin
S_Lezgin-1
0
0.77245
150
S_Mongola-1
Papuan
S_Papuan-8
0
0.77232
151
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-1
0
0.77197
152
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-1
0
0.77194
153
S_Mongola-1
Brahui
S_Brahui-2
0
0.77178
154
S_Mongola-1
Russia_Abkhasian
S_Abkhasian-2
0
0.77170
155
S_Mongola-1
Papuan
S_Papuan-1
0
0.77164
156
S_Mongola-1
Norwegian
S_Norwegian-1
0
0.77159
157
S_Mongola-1
Orcadian
S_Orcadian-2
0
0.77158
158
S_Mongola-1
Estonian
S_Estonian-1
0
0.77155
159
S_Mongola-1
Papuan
S_Papuan-7
0
0.77150
160
S_Mongola-1
Papuan
S_Papuan-11
0
0.77146
161
S_Mongola-1
Estonian
S_Estonian-2
0
0.77144
162
S_Mongola-1
Papuan
S_Papuan-13
0
0.77131
163
S_Mongola-1
Tajik
S_Tajik-1
0
0.77131
164
S_Mongola-1
Papuan
S_Papuan-14
0
0.77129
165
S_Mongola-1
Hungarian
S_Hungarian-2
0
0.77120
166
S_Mongola-1
Czech
S_Czech-2
0
0.77120
167
S_Mongola-1
Papuan
S_Papuan-3
0
0.77119
168
S_Mongola-1
Icelandic
S_Icelandic-2
0
0.77119
169
S_Mongola-1
Hungarian
S_Hungarian-1
0
0.77111
170
S_Mongola-1
Polish
S_Polish-1
0
0.77110
171
S_Mongola-1
Bulgarian
S_Bulgarian-1
0
0.77106
172
S_Mongola-1
Greek
S_Greek-1
0
0.77103
173
S_Mongola-1
Iranian-Fars
S_Iranian-Fars-2
0
0.77103
174
S_Mongola-1
Papuan
S_Papuan-5
0
0.77101
175
S_Mongola-1
French
S_French-2
0
0.77082
176
S_Mongola-1
Georgian
S_Georgian-1
0
0.77071
177
S_Mongola-1
Balochi
S_Balochi-1
0
0.77062
178
S_Mongola-1
Spanish
S_Spanish-1
0
0.77061
179
S_Mongola-1
Armenian
S_Armenian-1
0
0.77054
180
S_Mongola-1
Papuan
S_Papuan-6
0
0.77049
181
S_Mongola-1
Bergamo
S_Bergamo-2
0
0.77017
182
S_Mongola-1
Papuan
S_Papuan-2
0
0.77008
183
S_Mongola-1
Bulgarian
S_Bulgarian-2
0
0.77007
184
S_Mongola-1
Papuan
S_Papuan-4
0
0.77005
185
S_Mongola-1
Spanish
S_Spanish-2
0
0.76981
186
S_Mongola-1
Greek
S_Greek-2
0
0.76981
187
S_Mongola-1
Basque
S_Basque-1
0
0.76979
188
S_Mongola-1
English
S_English-1
0
0.76977
189
S_Mongola-1
Lezgin
S_Lezgin-2
0
0.76975
190
S_Mongola-1
Tuscan
S_Tuscan-2
0
0.76960
191
S_Mongola-1
Albanian.DG
S_Albanian1
0
0.76953
192
S_Mongola-1
English
S_English-2
0
0.76951
193
S_Mongola-1
Armenian
S_Armenian-2
0
0.76950
194
S_Mongola-1
Sardinian
S_Sardinian-2
0
0.76946
195
S_Mongola-1
Orcadian
S_Orcadian-1
0
0.76909
196
S_Mongola-1
Tuscan
S_Tuscan-1
0
0.76906
197
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-1
0
0.76901
198
S_Mongola-1
Basque
S_Basque-2
0
0.76888
199
S_Mongola-1
Georgian
S_Georgian-2
0
0.76886
200
S_Mongola-1
Jew_Iraqi
S_Iraqi_Jew-2
0
0.76865
201
S_Mongola-1
Jordanian
S_Jordanian-3
0
0.76809
202
S_Mongola-1
French
S_French-1
0
0.76796
203
S_Mongola-1
BedouinB
S_BedouinB-2
0
0.76779
204
S_Mongola-1
Druze
S_Druze-1
0
0.76757
205
S_Mongola-1
Druze
S_Druze-2
0
0.76754
206
S_Mongola-1
Makrani
S_Makrani-2
0
0.76747
207
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-2
0
0.76622
208
S_Mongola-1
Jew_Yemenite
S_Yemenite_Jew-1
0
0.76575
209
S_Mongola-1
Sardinian
S_Sardinian-1
0
0.76564
210
S_Mongola-1
BedouinB
S_BedouinB-1
0
0.76460
211
S_Mongola-1
Jordanian
S_Jordanian-2
0
0.76413
212
S_Mongola-1
Samaritan
S_Samaritan-1
0
0.76396
213
S_Mongola-1
Jordanian
S_Jordanian-1
0
0.76261
214
S_Mongola-1
Saharawi
S_Saharawi-2
0
0.75981
215
S_Mongola-1
Saharawi
S_Saharawi-1
0
0.75964
216
S_Mongola-1
Mozabite
S_Mozabite-1
0
0.75937
217
S_Mongola-1
Mozabite
S_Mozabite-2
0
0.75824
222
S_Mongola-1
Somali
S_Somali-1
0
0.74788
224
S_Mongola-1
Masai
S_Masai-2
0
0.74381
226
S_Mongola-1
Masai
S_Masai-1
0
0.74274
232
S_Mongola-1
Gambian
S_Gambian-2
0
0.73200
233
S_Mongola-1
BantuKenya
S_BantuKenya-1
0
0.73139
234
S_Mongola-1
Luo
S_Luo-2
0
0.73107
235
S_Mongola-1
BantuKenya
S_BantuKenya-2
0
0.73020
236
S_Mongola-1
Luhya
S_Luhya-1
0
0.73005
237
S_Mongola-1
Luhya
S_Luhya-2
0
0.73002
238
S_Mongola-1
Mandenka
S_Mandenka-2
0
0.72934
239
S_Mongola-1
Gambian
S_Gambian-1
0
0.72933
240
S_Mongola-1
Esan
S_Esan-2
0
0.72920
241
S_Mongola-1
Yoruba
S_Yoruba-2
0
0.72879
242
S_Mongola-1
Mandenka
S_Mandenka-1
0
0.72872
243
S_Mongola-1
Yoruba
S_Yoruba-1
0
0.72816
244
S_Mongola-1
Esan
S_Esan-1
0
0.72810
245
S_Mongola-1
Mende
S_Mende-1
0
0.72793
246
S_Mongola-1
Mende
S_Mende-2
0
0.72788
247
S_Mongola-1
Biaka
S_Biaka-1
0
0.72484
248
S_Mongola-1
Biaka
S_Biaka-2
0
0.72347
249
S_Mongola-1
Mbuti
S_Mbuti-3
0
0.72046
250
S_Mongola-1
Mbuti
S_Mbuti-1
0
0.72010
251
S_Mongola-1
Mbuti
S_Mbuti-2
0
0.72005
252
S_Mongola-1
Khomani_San
S_Khomani_San-2
0
0.71521
253
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-2
0
0.71514
254
S_Mongola-1
Ju_hoan_North
S_Ju_hoan_North-3
0
0.71460
255
S_Mongola-1
Khomani_San
S_Khomani_San-1
0
0.71302
</tbody>
PROOF G25 distances shouldn't be trusted
The late Paleolithic African paper showed that there was Eurasian geneflow back to Africa in the Paleolithic that affected pretty much all Africans including Mbuti. In other words even Mbuti got some Eurasian genes during the Paleolithic. Least affected were Khomani and Ju-Hoan.
The IBS list I posted accurately shows this by showing Mongola closer to Mbuti than to Khomani and Ju-Hoan.
The G25 (scaled) on the other hand gets it all wrong. You can try it yourself. It wrongly shows Mongola significantly closer to Khomani-San than Mbuti ! If it gets this wrong then how should the pops be trusted.
Distance to: Mongola
0.918673 Khomani_San
0.98425066 Ju_hoan_North
0.99607508 Mbuti
Here's additional proof something is not right with the G25. Everyone should know that Eurasians such as Kurds should be closest to other Eurasians and not Africans.
G25 also wrongly shows Kurds closer to Yorubans and Esans than to Papuans which is absurd. Additionally, G25 wrongly shows Kurds closer to Sudanese than to Karitiana and Surui.
Additionally G25 wrongly shows Kurds are closer to Jordanians than Kurds to E. Europeans and Uyghur. I can go on and on with the wrong ranking in G25.
<colgroup><col width="26"><col width="124"><col width="127"></colgroup><tbody>
NO
Kurdish
G25 Distance to:
1
Turkish_Kayseri
0.04594
2
Armenian_B
0.04996
3
Abkhasian
0.07100
4
Adygei
0.07185
5
Chechen
0.07279
6
Jordanian
0.09159
7
Balochi
0.12169
8
Albanian
0.12363
9
Brahui
0.12457
10
Bulgarian
0.13177
11
French_Al
0.16473
12
BedouinB
0.16728
13
Hungarian
0.16929
14
Czech
0.18128
15
Basque_French
0.19215
16
Finnish
0.21537
17
Mozabite
0.23311
18
Saharawi
0.26496
19
Uygur
0.28771
20
Hazara
0.28992
21
Kirghiz
0.39622
22
Jarawa
0.42858
23
Somali
0.43369
24
Mongolian
0.46764
25
Mongola
0.55815
26
Eskimo_Sireniki
0.56139
27
Japanese
0.58489
28
Sudanese
0.69730
29
Karitiana
0.71006
30
Surui
0.71489
31
Yoruba
0.74242
32
Esan_Nigeria
0.74434
33
Papuan
0.78951
34
Khomani_San
0.83812
35
Ju_hoan_North
0.90933
36
Mbuti
0.92566
</tbody>
<style type="text/css">td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}</style>
Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc
<colgroup><col width="32"><col width="123"><col width="100"></colgroup><tbody>
NO
POPULATION
DST
1
Lezgin
0.85119
2
Armenian
0.85040
3
Adygei
0.85039
4
Abkhasian
0.85027
5
Turkish-Kayseri
0.85012
6
Chechen
0.84983
7
Czech
0.84973
8
Hungarian
0.84956
9
Bulgarian
0.84940
10
French
0.84880
11
Basque
0.84860
12
Finnish
0.84860
13
Russian
0.84855
14
Estonian
0.84832
15
Sardinian
0.84817
16
Polish
0.84797
17
Pathan
0.84782
18
Tajik
0.84777
19
Kalash
0.84722
20
Sindhi
0.84702
21
Jew_Yemenite
0.84700
22
Tlingit
0.84695
23
Balochi
0.84675
24
Brahui
0.84615
25
Brahmin
0.84608
26
Samaritan
0.84603
27
BedouinB
0.84589
28
Saami
0.84589
29
Uyghur
0.84578
30
Makrani
0.84567
31
Mansi
0.84565
32
Bengali
0.84557
33
Punjabi
0.84517
34
Hazara
0.84498
35
Kyrgyz_Kyrgyzstan
0.84454
36
Jordanian
0.84422
37
Mala
0.84288
38
Tubalar
0.84250
39
Irula
0.84181
40
Even
0.84074
41
Mongola
0.84070
42
Tu
0.84029
43
Hezhen
0.84020
44
Mixtec
0.84018
45
Yakut
0.84000
46
Burmese
0.83998
47
Mexico_Zapotec.DG
0.83971
48
Xibo
0.83970
49
Naxi
0.83951
50
Han
0.83945
51
Korean
0.83923
52
Japanese
0.83898
53
Mayan
0.83886
54
Khonda_Dora
0.83884
55
Daur
0.83884
56
Tujia
0.83882
57
Quechua
0.83881
58
Eskimo_Sireniki.DG
0.83873
59
Oroqen
0.83861
60
Ulchi
0.83859
61
Eskimo_Naukan.DG
0.83855
62
She
0.83853
63
Miao
0.83845
64
Yi
0.83844
65
Itelmen
0.83824
66
Mixe
0.83819
67
Kinh
0.83813
68
China_Lahu
0.83783
69
Pima
0.83775
70
Thai
0.83774
71
Eskimo_Chaplin.DG
0.83767
72
Cambodian
0.83766
73
YANA_UP_WGS
0.83735
74
Dai
0.83730
75
Kusunda
0.83724
76
Piapoco
0.83703
77
Ami.DG
0.83696
78
Karitiana
0.83687
79
Surui
0.83654
80
Igorot
0.83649
81
Dusun
0.83639
82
Saharawi
0.83398
83
Mozabite
0.83287
84
Bougainville
0.83084
85
Papuan
0.82871
86
Somali
0.81444
87
Masai
0.80654
88
BantuKenya
0.79064
89
Luo
0.79045
90
Gambian
0.78966
91
Luhya
0.78919
92
Mandenka
0.78855
93
Esan
0.78710
94
Mende
0.78708
95
Yoruba
0.78690
96
Biaka
0.78118
97
Mbuti
0.77853
98
Ju_hoan_North
0.77354
99
Khomani_San
0.77330
</tbody>
<style type="text/css">td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}</style>
Lucas
03-10-2021, 09:00 PM
Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc
Zoro, but you somewhat compare apples to oranges. List of euclidean distances based on PCA values, and direct gene-to-gene comparison.
Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.
Zoro, but you somewhat compare apples to oranges. List of euclidean distances based on PCA values, and direct gene-to-gene comparison.
Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.
One way to re-word what you just said is one to one gene to gene comparison using IBS is more accurate method than G25 or Admixture calculator in determining genetic similarity between 2 pops say Kurds and Bulgarians or Mongolians.
I'm reminded of something Dilawer told me a while back. He said Admixture or PCA based methods don't accurately portray genetic similarity between 2 populations like one to one IBS comparison. They just cluster based on geography and not based on genes. That's partly the reason why individuals in a population have all sorts of phenotypes but Admixture or PCA still clusters them together.
Although PCA or Admixture clusters Kurds or Poles within clusters, if one does IBS on individual Poles or Kurds then they may show widely differing results with regards to genetic similarity with Siberians or E. Asians depending on which components the calculator uses or what samples the G25 PCA used. By contrast, IBS results are not depending on this stuff and have no relevance to what samples are used.
This may in fact be more closely aligned with their phenotypes than G25 or Admixture results which would cluster the Poles or Kurds within clusters and these clusters would not explain their individualistic phenotypes like IBS would explain.
Komintasavalta
03-10-2021, 11:43 PM
You created merged dataset or you find it somewhere?
It's from this post by Razib Khan: https://www.gnxp.com/WordPress/2018/07/11/tutorial-to-run-pca-admixture-treemix-and-pairwise-fst-in-one-command/.
Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.
Khvorykh et al. 2020 even did admixture-style analysis based on the number of shared IBD segments: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/:
The fourth stage of our computations is unique to this research and was absent in Fedorova et al. 2016. In this stage, we created Supplementary Table S4 using the program rankingATLAS2_v9.pl, and the data from the Supplementary Table S1 ("IBD Normalized Numbers"). Supplementary Table S4 presents the percentages of relative relatedness of each population to the nine Distinct Human Genetic Regions (DHGRs) (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE, see Results section). For each population (e.g., Georgia) the program counts the numbers of shared IBD fragments per pair of individuals for this population with the three representatives of DHGR region and then makes a sum of these three numbers. For example, the for the AFE region, the summing number of shared IBDs will be the following: 0.48 IBDs (per pair for Georgia vs. LWK) + 0.92 (Georgia vs. Din_AFR) + 3.12 (Georgia vs. Mas_AFR) = 4.52 (for the AFE group). And so on for each DHGR group. In order to minimize the Founder effect in our calculations, we created an upper threshold of 100 shared IBD segments for any populational pair. For example, in a calculation of Congo (Con_AFR) vs. LWK, the original value was 151.9, however, with the threshold in place, the program changed the value to 100). Finally, we calculated the relative percentages for all 9 components (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE) in a way that ensured their sum was always 100%. Ranking data for each population (as presented in Table 2) were also obtained by rankingATLAS2_v9.pl.
Here's a graph I made of some populations from Khvorykh's table S4:
https://i.ibb.co/3dkkgnx/khvorykh-ibd.png
curl -Ls pastebin.com/raw/BmNdqWvi|tr -d \\r>/tmp/tables4
printf %s\\n Sau_MDE Ira_MDE Rom_EUR Gre_EUR Ger_EUR GBR_EUR Swe_EUR Lat_EUR Rus_EUR Est_EUR Fin_EUR FIN_EUR Ing_EUR Kar_EUR Vep_EUR Saa_EUR Mor_EUR Kom_EUR Udm_EUR Mar_EUR Mis_EUR Kry_EUR Tat_EUR Chu_EUR BSh_EUR Man_SIB Kha_SIB Tun_SIB For_SIB Nen_SIB Nga_SIB Bur_SIB Yak_SIB Ale_ARC>/tmp/pop
awk -F, 'NR==1{print;next}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1]}' /tmp/tables4 /tmp/pop|awk -F, -v OFS=, '{print$2,$6,$11,$10,$7,$8,$5,$9,$3,$4}'>/tmp/a
R -e 'library("ggplot2")
library("reshape2");
t=read.csv("/tmp/a",header=T,check.names=F)
t2=melt(t,id.var="Population")
lab=round(t2$value)
lab[lab<=2]=""
t2$lab=lab
t2$value=t2$value/100
ggplot(t2,aes(x=fct_rev(factor(Population,level=un ique(Population))),y=value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=lab),position=position_stack(v just=.5,reverse=T),size=2.5)+
coord_flip()+
theme(
axis.text=element_text(color="black"),
axis.text.x=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
legend.margin=margin(0),
legend.title=element_blank(),
panel.background=element_rect(fill="white"),
)+
xlab("")+
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
ggsave("/tmp/a.png",width=6,height=7)'
The proportion of the Northern European component was defined based on the number of shared IBD segments with Estonians, Germans, and Swedes. So for example Swedes have a higher proportion of the Northern European component than Latvians.
Komintasavalta
03-11-2021, 12:14 AM
BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread.php?22231-does-g25-have-a-north-and-especially-east-european-bias/page2):
G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).
However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.
It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.
For example here I generated a 12 by 12 matrix of FST distances:
R -e 'library(admixtools);
f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])};
fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG"));
write.csv(round(f2m(fst),6),"fst",quote=F)'
$ cat fst
,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khom ani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG ,Turkey_N.DG,Yoruba.DG
Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.2 08092,0.055175,0.200832,0.264921,0.19757,0.037891
Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.02 7304,0.243293,0.020451,0.188681,0.138516,0.189624
Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0. 102589,0.211397,0.089601,0.188651,0.03734,0.156253
Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034 955,0.255676,0.102751,0.247671,0.311007,0.244202,0 .108353
Khomani_San.DG,0.093686,0.269936,0.236001,0.034955 ,0,0.264307,0.110281,0.256679,0.319966,0.253402,0. 115599
Korean.DG,0.208092,0.027304,0.102589,0.255676,0.26 4307,0,0.238141,0.001142,0.178226,0.136865,0.18475 6
Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110 281,0.238141,0,0.230583,0.294664,0.228177,0.077978
Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.2 56679,0.001142,0.230583,0,0.171326,0.130389,0.1765 66
Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.31 9966,0.178226,0.294664,0.171326,0,0.215617,0.24197 7
Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.25 3402,0.136865,0.228177,0.130389,0.215617,0,0.17299 2
Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.11 5599,0.184756,0.077978,0.176566,0.241977,0.172992, 0
Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:
$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)'
[,1] [,2] [,3]
Biaka.DG 0.09458067 -0.009318035 0.0007634203
Even.DG -0.10587237 0.033672133 -0.0493091783
Finnish.DG -0.06971126 0.039180919 0.0443036464
Ju_hoan_North.DG 0.14384037 -0.005407783 -0.0079752958
Khomani_San.DG 0.15305612 -0.005072182 -0.0095401289
Korean.DG -0.10263674 0.022172427 -0.0479094108
Mbuti.DG 0.12082958 -0.006742200 -0.0017669591
Mongola.DG -0.09712661 0.017649424 -0.0402117613
Papuan.DG -0.13332805 -0.137725617 0.0231446908
Turkey_N.DG -0.07026603 0.060365299 0.0804792633
Yoruba.DG 0.06663432 -0.008774385 0.0080217135
Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:
$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3); sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2))
[1] 0.2110375
With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.
BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread.php?22231-does-g25-have-a-north-and-especially-east-european-bias/page2):
G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).
However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.
It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.
For example here I generated a 12 by 12 matrix of FST distances:
R -e 'library(admixtools);
f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])};
fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG"));
write.csv(round(f2m(fst),6),"fst",quote=F)'
$ cat fst
,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khom ani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG ,Turkey_N.DG,Yoruba.DG
Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.2 08092,0.055175,0.200832,0.264921,0.19757,0.037891
Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.02 7304,0.243293,0.020451,0.188681,0.138516,0.189624
Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0. 102589,0.211397,0.089601,0.188651,0.03734,0.156253
Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034 955,0.255676,0.102751,0.247671,0.311007,0.244202,0 .108353
Khomani_San.DG,0.093686,0.269936,0.236001,0.034955 ,0,0.264307,0.110281,0.256679,0.319966,0.253402,0. 115599
Korean.DG,0.208092,0.027304,0.102589,0.255676,0.26 4307,0,0.238141,0.001142,0.178226,0.136865,0.18475 6
Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110 281,0.238141,0,0.230583,0.294664,0.228177,0.077978
Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.2 56679,0.001142,0.230583,0,0.171326,0.130389,0.1765 66
Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.31 9966,0.178226,0.294664,0.171326,0,0.215617,0.24197 7
Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.25 3402,0.136865,0.228177,0.130389,0.215617,0,0.17299 2
Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.11 5599,0.184756,0.077978,0.176566,0.241977,0.172992, 0
Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:
$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)'
[,1] [,2] [,3]
Biaka.DG 0.09458067 -0.009318035 0.0007634203
Even.DG -0.10587237 0.033672133 -0.0493091783
Finnish.DG -0.06971126 0.039180919 0.0443036464
Ju_hoan_North.DG 0.14384037 -0.005407783 -0.0079752958
Khomani_San.DG 0.15305612 -0.005072182 -0.0095401289
Korean.DG -0.10263674 0.022172427 -0.0479094108
Mbuti.DG 0.12082958 -0.006742200 -0.0017669591
Mongola.DG -0.09712661 0.017649424 -0.0402117613
Papuan.DG -0.13332805 -0.137725617 0.0231446908
Turkey_N.DG -0.07026603 0.060365299 0.0804792633
Yoruba.DG 0.06663432 -0.008774385 0.0080217135
Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:
$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3); sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2))
[1] 0.2110375
With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.
Very good. You're thinking out of the box!. Yes of course you can make a calculator based on FST or IBS. You can do IBS between target and WHG, ENF, ANS, etc and even square the individual results to create bigger differences between target and assign each a prorated proportion of 100%.
At least it wouldn't have the biases and variability of results like G25 or Admixture where the results depend on the other samples in the runs.
Peterski
03-11-2021, 03:53 AM
One way to re-word what you just said is one to one gene to gene comparison using IBS is more accurate method than G25 or Admixture calculator in determining genetic similarity between 2 pops say Kurds and Bulgarians or Mongolians.
I'm reminded of something Dilawer told me a while back. He said Admixture or PCA based methods don't accurately portray genetic similarity between 2 populations like one to one IBS comparison. They just cluster based on geography and not based on genes. That's partly the reason why individuals in a population have all sorts of phenotypes but Admixture or PCA still clusters them together.
Although PCA or Admixture clusters Kurds or Poles within clusters, if one does IBS on individual Poles or Kurds then they may show widely differing results with regards to genetic similarity with Siberians or E. Asians depending on which components the calculator uses or what samples the G25 PCA used. By contrast, IBS results are not depending on this stuff and have no relevance to what samples are used.
This may in fact be more closely aligned with their phenotypes than G25 or Admixture results which would cluster the Poles or Kurds within clusters and these clusters would not explain their individualistic phenotypes like IBS would explain.
It's from this post by Razib Khan: https://www.gnxp.com/WordPress/2018/07/11/tutorial-to-run-pca-admixture-treemix-and-pairwise-fst-in-one-command/.
Khvorykh et al. 2020 even did admixture-style analysis based on the number of shared IBD segments: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/:
The fourth stage of our computations is unique to this research and was absent in Fedorova et al. 2016. In this stage, we created Supplementary Table S4 using the program rankingATLAS2_v9.pl, and the data from the Supplementary Table S1 ("IBD Normalized Numbers"). Supplementary Table S4 presents the percentages of relative relatedness of each population to the nine Distinct Human Genetic Regions (DHGRs) (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE, see Results section). For each population (e.g., Georgia) the program counts the numbers of shared IBD fragments per pair of individuals for this population with the three representatives of DHGR region and then makes a sum of these three numbers. For example, the for the AFE region, the summing number of shared IBDs will be the following: 0.48 IBDs (per pair for Georgia vs. LWK) + 0.92 (Georgia vs. Din_AFR) + 3.12 (Georgia vs. Mas_AFR) = 4.52 (for the AFE group). And so on for each DHGR group. In order to minimize the Founder effect in our calculations, we created an upper threshold of 100 shared IBD segments for any populational pair. For example, in a calculation of Congo (Con_AFR) vs. LWK, the original value was 151.9, however, with the threshold in place, the program changed the value to 100). Finally, we calculated the relative percentages for all 9 components (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE) in a way that ensured their sum was always 100%. Ranking data for each population (as presented in Table 2) were also obtained by rankingATLAS2_v9.pl.
Here's a graph I made of some populations from Khvorykh's table S4:
https://i.ibb.co/3dkkgnx/khvorykh-ibd.png
curl -Ls pastebin.com/raw/BmNdqWvi|tr -d \\r>/tmp/tables4
printf %s\\n Sau_MDE Ira_MDE Rom_EUR Gre_EUR Ger_EUR GBR_EUR Swe_EUR Lat_EUR Rus_EUR Est_EUR Fin_EUR FIN_EUR Ing_EUR Kar_EUR Vep_EUR Saa_EUR Mor_EUR Kom_EUR Udm_EUR Mar_EUR Mis_EUR Kry_EUR Tat_EUR Chu_EUR BSh_EUR Man_SIB Kha_SIB Tun_SIB For_SIB Nen_SIB Nga_SIB Bur_SIB Yak_SIB Ale_ARC>/tmp/pop
awk -F, 'NR==1{print;next}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1]}' /tmp/tables4 /tmp/pop|awk -F, -v OFS=, '{print$2,$6,$11,$10,$7,$8,$5,$9,$3,$4}'>/tmp/a
R -e 'library("ggplot2")
library("reshape2");
t=read.csv("/tmp/a",header=T,check.names=F)
t2=melt(t,id.var="Population")
lab=round(t2$value)
lab[lab<=2]=""
t2$lab=lab
t2$value=t2$value/100
ggplot(t2,aes(x=fct_rev(factor(Population,level=un ique(Population))),y=value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=lab),position=position_stack(v just=.5,reverse=T),size=2.5)+
coord_flip()+
theme(
axis.text=element_text(color="black"),
axis.text.x=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
legend.margin=margin(0),
legend.title=element_blank(),
panel.background=element_rect(fill="white"),
)+
xlab("")+
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
ggsave("/tmp/a.png",width=6,height=7)'
The proportion of the Northern European component was defined based on the number of shared IBD segments with Estonians, Germans, and Swedes. So for example Swedes have a higher proportion of the Northern European component than Latvians.
BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread.php?22231-does-g25-have-a-north-and-especially-east-european-bias/page2):
G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).
However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.
It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.
For example here I generated a 12 by 12 matrix of FST distances:
R -e 'library(admixtools);
f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])};
fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG"));
write.csv(round(f2m(fst),6),"fst",quote=F)'
$ cat fst
,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khom ani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG ,Turkey_N.DG,Yoruba.DG
Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.2 08092,0.055175,0.200832,0.264921,0.19757,0.037891
Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.02 7304,0.243293,0.020451,0.188681,0.138516,0.189624
Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0. 102589,0.211397,0.089601,0.188651,0.03734,0.156253
Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034 955,0.255676,0.102751,0.247671,0.311007,0.244202,0 .108353
Khomani_San.DG,0.093686,0.269936,0.236001,0.034955 ,0,0.264307,0.110281,0.256679,0.319966,0.253402,0. 115599
Korean.DG,0.208092,0.027304,0.102589,0.255676,0.26 4307,0,0.238141,0.001142,0.178226,0.136865,0.18475 6
Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110 281,0.238141,0,0.230583,0.294664,0.228177,0.077978
Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.2 56679,0.001142,0.230583,0,0.171326,0.130389,0.1765 66
Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.31 9966,0.178226,0.294664,0.171326,0,0.215617,0.24197 7
Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.25 3402,0.136865,0.228177,0.130389,0.215617,0,0.17299 2
Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.11 5599,0.184756,0.077978,0.176566,0.241977,0.172992, 0
Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:
$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)'
[,1] [,2] [,3]
Biaka.DG 0.09458067 -0.009318035 0.0007634203
Even.DG -0.10587237 0.033672133 -0.0493091783
Finnish.DG -0.06971126 0.039180919 0.0443036464
Ju_hoan_North.DG 0.14384037 -0.005407783 -0.0079752958
Khomani_San.DG 0.15305612 -0.005072182 -0.0095401289
Korean.DG -0.10263674 0.022172427 -0.0479094108
Mbuti.DG 0.12082958 -0.006742200 -0.0017669591
Mongola.DG -0.09712661 0.017649424 -0.0402117613
Papuan.DG -0.13332805 -0.137725617 0.0231446908
Turkey_N.DG -0.07026603 0.060365299 0.0804792633
Yoruba.DG 0.06663432 -0.008774385 0.0080217135
Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:
$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3); sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2))
[1] 0.2110375
With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.
Very interesting!
Lucas
03-11-2021, 09:41 AM
BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread.php?22231-does-g25-have-a-north-and-especially-east-european-bias/page2):
G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).
However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.
It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.
With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.
It is for sure Smart PCA not Plink PCA. Here Davidski said about it: https://eurogenes.blogspot.com/2017/05/pca-projection-bias-fix.html
BTW user vbknethio created G30 using SmartPCA https://www.theapricity.com/forum/showthread.php?314681-BCE-G30-beta-PCA-with-6000-samples. Yes it is identical product like G25, just he didn't have the same samples.
Zanzibar
06-09-2021, 05:16 AM
I wonder why the Mari have like 3-3.5 times more East Eurasian than the Mordovians (~10% vs ~30%). Those two republics are not even that far away from each other.
Could it be that the Mari directly absorbed and mixed with Bashkirs and Ugrics like Khanty-Mansi while Mordovians receive East Eurasian from an indirect source? That could be why.
Anyway do you consider groups like Mari, Udmurt, Saami European or more mixed race?
Zanzibar
06-09-2021, 05:26 AM
I didn't know Udmurts had such high Mongoloid ancestry considering the predominance of red hair in them
Enviado desde mi SM-A107M mediante Tapatalk
Yep they are genetically around 25% Mongoloid at least or a bit more. It's surprising how much red hair they have considering how much East Asian they are. They are literally quapas aka 1/4 Asian lol.
LorenzoSpitaleri
06-09-2021, 06:12 AM
Yep they are genetically around 25% Mongoloid at least or a bit more. It's surprising how much red hair they have considering how much East Asian they are. They are literally quapas aka 1/4 Asian lol.
Very impressing, I used to think mongoloid genes negated any chance of red hair appearing. But now that I think of it, I have seen even straight up mestizos and even darker mixes with the hair colour.
Zanzibar
06-09-2021, 07:06 AM
Very impressing, I used to think mongoloid genes negated any chance of red hair appearing. But now that I think of it, I have seen even straight up mestizos and even darker mixes with the hair colour.
Woah you have seen mestizos with red hair? By straight up you mean someone who is 50-50% or has one white parent and another Native parent?
Do you consider Udmurts and others like Mari, Saami European or do think they are mixed race? They are the reverse version of Siberian Turkic/Central Asian ethnicities like Altaians, Kyrgyzs, some Kazakhs who are approximately 25-35% Caucasoid lol.
They are literally the equivalent of someone who has an Asian grandparent lol.
Komintasavalta
06-09-2021, 04:44 PM
Could it be that the Mari directly absorbed and mixed with Bashkirs and Ugrics like Khanty-Mansi while Mordovians receive East Eurasian from an indirect source? That could be why.
Bashkirs have a lot of eastern mtDNA haplogroups but Maris don't. The distribution of eastern mtDNA haplogroups in Bashkirs is otherwise similar to Khanty and Mansi, but Bashkirs have more F, which is common in Shors and Khakasses.
One mysterious thing is that Udmurts have a huge amount of eastern mtDNA compared to Maris and Chuvashes.
https://i.ibb.co/hZHvfF5/tambets-ydna-mtdna-xy.png
https://i.ibb.co/ys9zNMj/complexheatmap-tambets-2018-mtdna.png
LorenzoSpitaleri
06-10-2021, 12:45 AM
Woah you have seen mestizos with red hair? By straight up you mean someone who is 50-50% or has one white parent and another Native parent?
Do you consider Udmurts and others like Mari, Saami European or do think they are mixed race? They are the reverse version of Siberian Turkic/Central Asian ethnicities like Altaians, Kyrgyzs, some Kazakhs who are approximately 25-35% Caucasoid lol.
They are literally the equivalent of someone who has an Asian grandparent lol.
With that I meant people with very strong mixed amerindian features, looked balanced mestizo. And even people less white than that. A good example is this guy from my city, now I don't know him personally but I reckon he is known for being a redhead triracial nigga.
https://i.imgur.com/JJ2hOtI.jpg
https://i.imgur.com/rkVF7OC.jpg
https://i.imgur.com/eN0hv9E.jpg
Zanzibar
06-10-2021, 03:33 AM
With that I meant people with very strong mixed amerindian features, looked balanced mestizo. And even people less white than that. A good example is this guy from my city, now I don't know him personally but I reckon he is known for being a redhead triracial nigga.
That's his natural hair color? Ah ok. Have you see Amerindians with red hair though?
Btw do you consider Udmurts to be mixed race by their high Mongoloid dna?
LorenzoSpitaleri
06-10-2021, 06:14 PM
That's his natural hair color? Ah ok. Have you see Amerindians with red hair though?
Btw do you consider Udmurts to be mixed race by their high Mongoloid dna?
Yes it is. And yes but mixed amerindians not pure.
They're on the way like I said somewhere else, mixed-white range
Zanzibar
06-11-2021, 01:48 AM
Yes it is. And yes but mixed amerindians not pure.
They're on the way like I said somewhere else, mixed-white range
Ok.
Alright. Udmurts score the same amount of Mongoloid as Castizos scoring Amerindian and Quadroons would score SSA. Would you also consider Castizos and Quadroons as being on the way, on the mixed-white range as well?
Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc
<colgroup><col width="32"><col width="123"><col width="100"></colgroup><tbody>
NO
POPULATION
DST
1
Lezgin
0.85119
2
Armenian
0.85040
3
Adygei
0.85039
4
Abkhasian
0.85027
5
Turkish-Kayseri
0.85012
6
Chechen
0.84983
7
Czech
0.84973
8
Hungarian
0.84956
9
Bulgarian
0.84940
10
French
0.84880
11
Basque
0.84860
12
Finnish
0.84860
13
Russian
0.84855
14
Estonian
0.84832
15
Sardinian
0.84817
16
Polish
0.84797
17
Pathan
0.84782
18
Tajik
0.84777
19
Kalash
0.84722
20
Sindhi
0.84702
21
Jew_Yemenite
0.84700
22
Tlingit
0.84695
23
Balochi
0.84675
24
Brahui
0.84615
25
Brahmin
0.84608
26
Samaritan
0.84603
27
BedouinB
0.84589
28
Saami
0.84589
29
Uyghur
0.84578
30
Makrani
0.84567
31
Mansi
0.84565
32
Bengali
0.84557
33
Punjabi
0.84517
34
Hazara
0.84498
35
Kyrgyz_Kyrgyzstan
0.84454
36
Jordanian
0.84422
37
Mala
0.84288
38
Tubalar
0.84250
39
Irula
0.84181
40
Even
0.84074
41
Mongola
0.84070
42
Tu
0.84029
43
Hezhen
0.84020
44
Mixtec
0.84018
45
Yakut
0.84000
46
Burmese
0.83998
47
Mexico_Zapotec.DG
0.83971
48
Xibo
0.83970
49
Naxi
0.83951
50
Han
0.83945
51
Korean
0.83923
52
Japanese
0.83898
53
Mayan
0.83886
54
Khonda_Dora
0.83884
55
Daur
0.83884
56
Tujia
0.83882
57
Quechua
0.83881
58
Eskimo_Sireniki.DG
0.83873
59
Oroqen
0.83861
60
Ulchi
0.83859
61
Eskimo_Naukan.DG
0.83855
62
She
0.83853
63
Miao
0.83845
64
Yi
0.83844
65
Itelmen
0.83824
66
Mixe
0.83819
67
Kinh
0.83813
68
China_Lahu
0.83783
69
Pima
0.83775
70
Thai
0.83774
71
Eskimo_Chaplin.DG
0.83767
72
Cambodian
0.83766
73
YANA_UP_WGS
0.83735
74
Dai
0.83730
75
Kusunda
0.83724
76
Piapoco
0.83703
77
Ami.DG
0.83696
78
Karitiana
0.83687
79
Surui
0.83654
80
Igorot
0.83649
81
Dusun
0.83639
82
Saharawi
0.83398
83
Mozabite
0.83287
84
Bougainville
0.83084
85
Papuan
0.82871
86
Somali
0.81444
87
Masai
0.80654
88
BantuKenya
0.79064
89
Luo
0.79045
90
Gambian
0.78966
91
Luhya
0.78919
92
Mandenka
0.78855
93
Esan
0.78710
94
Mende
0.78708
95
Yoruba
0.78690
96
Biaka
0.78118
97
Mbuti
0.77853
98
Ju_hoan_North
0.77354
99
Khomani_San
0.77330
</tbody>
<style type="text/css">td {border: 1px solid #ccc;}br {mso-data-placement:same-cell;}</style>
Why Kurds are closer to Europeans than to Jordanians? It makes no sense to me.
Ayetooey
06-12-2021, 06:56 PM
Because a lot of users here are Pan-Europeanists who refuse to acknowledge how genetically diverse this continent truly is.
Powered by vBulletin® Version 4.2.3 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.