Why is the amount of East Eurasian ancestry of Saamis and other Uralics underestimate by some here?

**Lucas** · 03-10-2021, 03:45 PM

Originally Posted by Komintasavalta

Est1000HGDP.fam

You created merged dataset or you find it somewhere?

**LorenzoSpitaleri** · 03-10-2021, 03:48 PM

I didn't know Udmurts had such high Mongoloid ancestry considering the predominance of red hair in them

Enviado desde mi SM-A107M mediante Tapatalk

**Zoro** · 03-10-2021, 06:06 PM

Originally Posted by Mingle

Can you link the paper?

Here is the supp

https://www.biorxiv.org/content/bior...?download=true

Here is the paper

https://www.biorxiv.org/content/10.1...555v1.full.pdf

**Zoro** · 03-10-2021, 06:46 PM

Originally Posted by Zoro

NO	FID1	FID2	IID2	PI_HAT	IBS
1	S_Mongola-1	Korean	S_Korean-1	0.157	0.81068
2	S_Mongola-1	Han	S_Han-1	0.1538	0.81020
3	S_Mongola-1	Japanese	S_Japanese-1	0.1603	0.80999
4	S_Mongola-1	Xibo	S_Xibo-2	0.1463	0.80968
5	S_Mongola-1	Korean	S_Korean-2	0.1546	0.80955
6	S_Mongola-1	Han	S_Han-2	0.1562	0.80910
7	S_Mongola-1	Tujia	S_Tujia-2	0.1522	0.80896
8	S_Mongola-1	Japanese	S_Japanese-2	0.148	0.80880
9	S_Mongola-1	She	S_She-1	0.1542	0.80875
10	S_Mongola-1	She	S_She-2	0.1535	0.80870
11	S_Mongola-1	Naxi	S_Naxi-1	0.1527	0.80869
12	S_Mongola-1	Japanese	S_Japanese-3	0.1426	0.80865
13	S_Mongola-1	Hezhen	S_Hezhen-2	0.1438	0.80863
14	S_Mongola-1	Yi	S_Yi-1	0.1494	0.80853
15	S_Mongola-1	Xibo	S_Xibo-1	0.1408	0.80837
16	S_Mongola-1	Miao	S_Miao-2	0.1534	0.80827
17	S_Mongola-1	Kinh	S_Kinh-1	0.1488	0.80800
18	S_Mongola-1	Naxi	S_Naxi-3	0.1516	0.80795
19	S_Mongola-1	Hezhen	S_Hezhen-1	0.1514	0.80782
20	S_Mongola-1	Tujia	S_Tujia-1	0.1519	0.80772
21	S_Mongola-1	Mongola	S_Mongola-2	0.1456	0.80755
22	S_Mongola-1	Miao	S_Miao-1	0.1518	0.80748
23	S_Mongola-1	Ulchi	S_Ulchi-1	0.1642	0.80746
24	S_Mongola-1	Oroqen	S_Oroqen-1	0.1575	0.80745
25	S_Mongola-1	Yi	S_Yi-2	0.1529	0.80724
26	S_Mongola-1	Daur	S_Daur-2	0.1422	0.80716
27	S_Mongola-1	Ulchi	S_Ulchi-2	0.1566	0.80713
28	S_Mongola-1	Oroqen	S_Oroqen-2	0.1588	0.80693
29	S_Mongola-1	Dai	S_Dai-1	0.1463	0.80672
30	S_Mongola-1	Even	S_Even-3	0.1583	0.80661
31	S_Mongola-1	Dai	S_Dai-2	0.1519	0.80603
32	S_Mongola-1	Tu	S_Tu-2	0.1387	0.80580
33	S_Mongola-1	Kinh	S_Kinh-2	0.1415	0.80574
34	S_Mongola-1	Thai	S_Thai-2	0.1401	0.80573
35	S_Mongola-1	China_Lahu	S_Lahu-1	0.1524	0.80558
36	S_Mongola-1	Burmese	S_Burmese-1	0.1385	0.80540
37	S_Mongola-1	Tu	S_Tu-1	0.1354	0.80530
38	S_Mongola-1	Ami.DG	S_Ami1	0.1575	0.80503
39	S_Mongola-1	Ami.DG	S_Ami2	0.1595	0.80502
40	S_Mongola-1	Even	S_Even-2	0.1555	0.80488
41	S_Mongola-1	Yakut	S_Yakut-1	0.1485	0.80419
42	S_Mongola-1	China_Lahu	S_Lahu-2	0.1523	0.80397
43	S_Mongola-1	Igorot	S_Igorot-2	0	0.80313
44	S_Mongola-1	Dusun	S_Dusun-2	0	0.80309
45	S_Mongola-1	Dusun	S_Dusun-1	0	0.80308
46	S_Mongola-1	Thai	S_Thai-1	0.1275	0.80306
47	S_Mongola-1	Igorot	S_Igorot-1	0	0.80301
48	S_Mongola-1	Cambodian	S_Cambodian-1	0.1407	0.80241
49	S_Mongola-1	Even	S_Even-1	0.1214	0.80214
50	S_Mongola-1	Burmese	S_Burmese-2	0.1169	0.80213
51	S_Mongola-1	Yakut	S_Yakut-2	0.1438	0.80209
52	S_Mongola-1	Cambodian	S_Cambodian-2	0.134	0.80188
53	S_Mongola-1	Eskimo_Sireniki.DG	S_Sireniki1	0	0.80124
54	S_Mongola-1	Kyrgyz_Kyrgyzstan	S_Kyrgyz-1	0.1127	0.79908
55	S_Mongola-1	Kyrgyz_Kyrgyzstan	S_Kyrgyz-2	0.1005	0.79815
56	S_Mongola-1	Itelmen	S_Itelman-1	0	0.79809
57	S_Mongola-1	Eskimo_Naukan.DG	S_Naukan2	0	0.79789
58	S_Mongola-1	Eskimo_Chaplin.DG	S_Chaplin1	0	0.79770
59	S_Mongola-1	Eskimo_Naukan.DG	S_Naukan1	0	0.79751
60	S_Mongola-1	Eskimo_Sireniki.DG	S_Sireniki2	0	0.79749
61	S_Mongola-1	Kusunda	S_Kusunda-1	0.1132	0.79740
62	S_Mongola-1	Tubalar	S_Tubalar-2	0	0.79509
63	S_Mongola-1	Tubalar	S_Tubalar-1	0.1107	0.79490
64	S_Mongola-1	Chukchi	S_Chukchi-1	0.0841	0.79357
65	S_Mongola-1	Uyghur	S_Uygur-1	0.0898	0.79336
66	S_Mongola-1	Mexico_Zapotec.DG	S_Zapotec1	0	0.79282
67	S_Mongola-1	Mansi	S_Mansi-1	0	0.79238
68	S_Mongola-1	Hazara	S_Hazara-1	0	0.79204
69	S_Mongola-1	Pima	S_Pima-1	0	0.79198
70	S_Mongola-1	Uyghur	S_Uygur-2	0	0.79197
71	S_Mongola-1	Hazara	S_Hazara-2	0	0.79170
72	S_Mongola-1	Mayan	S_Mayan-2	0	0.79120
73	S_Mongola-1	Mixtec	S_Mixtec-1	0	0.79120
74	S_Mongola-1	Mixe	S_Mixe-2	0	0.79115
75	S_Mongola-1	Mexico_Zapotec.DG	S_Zapotec2	0	0.79101
76	S_Mongola-1	Mayan	S_Mayan-1	0	0.79087
77	S_Mongola-1	Quechua	S_Quechua-3	0	0.79075
78	S_Mongola-1	Mixe	S_Mixe-3	0	0.79044
79	S_Mongola-1	Piapoco	S_Piapoco-2	0	0.79029
80	S_Mongola-1	Quechua	S_Quechua-1	0	0.79023
81	S_Mongola-1	Quechua	S_Quechua-2	0	0.78995
82	S_Mongola-1	Pima	S_Pima-2	0	0.78978
83	S_Mongola-1	Mansi	S_Mansi-2	0	0.78962
84	S_Mongola-1	Khonda_Dora	S_Khonda_Dora-1	0	0.78847
85	S_Mongola-1	Tlingit	S_Tlingit-2	0	0.78816
86	S_Mongola-1	Mixtec	S_Mixtec-2	0	0.78811
87	S_Mongola-1	Maori	S_Maori-1	0.0542	0.78805
88	S_Mongola-1	Piapoco	S_Piapoco-1	0	0.78747
89	S_Mongola-1	Karitiana	S_Karitiana-2	0	0.78742
90	S_Mongola-1	Surui	S_Surui-1	0	0.78727
91	S_Mongola-1	Surui	S_Surui-2	0	0.78565
92	S_Mongola-1	Karitiana	S_Karitiana-1	0	0.78561
93	S_Mongola-1	Bengali	S_Bengali-1	0	0.78436
94	S_Mongola-1	Kusunda	S_Kusunda-2	0	0.78408
95	S_Mongola-1	Tlingit	S_Tlingit-1	0	0.78388
96	S_Mongola-1	Relli	S_Relli-1	0	0.78344
97	S_Mongola-1	Kapu	S_Kapu-2	0	0.78280
98	S_Mongola-1	Madiga	S_Madiga-1	0	0.78227
99	S_Mongola-1	Madiga	S_Madiga-2	0	0.78175
100	S_Mongola-1	Mala	S_Mala-3	0	0.78161
101	S_Mongola-1	Yadava	S_Yadava-1	0	0.78157
102	S_Mongola-1	Bengali	S_Bengali-2	0	0.78140
103	S_Mongola-1	Kapu	S_Kapu-1	0	0.78130
104	S_Mongola-1	Irula	S_Irula-2	0	0.78128
105	S_Mongola-1	Mala	S_Mala-2	0	0.78128
106	S_Mongola-1	Punjabi	S_Punjabi-1	0	0.78107
107	S_Mongola-1	Irula	S_Irula-1	0	0.78107
108	S_Mongola-1	Burusho	S_Burusho-2	0	0.78081
109	S_Mongola-1	Yadava	S_Yadava-2	0	0.78078
110	S_Mongola-1	Saami	S_Saami-1	0	0.78063
111	S_Mongola-1	Brahmin	S_Brahmin-2	0	0.78031
112	S_Mongola-1	Saami	S_Saami-2	0	0.78012
113	S_Mongola-1	Relli	S_Relli-2	0	0.77974
114	S_Mongola-1	Punjabi	S_Punjabi-3	0	0.77920
115	S_Mongola-1	Bougainville	S_Bougainville-1	0	0.77900
116	S_Mongola-1	Burusho	S_Burusho-1	0	0.77885
117	S_Mongola-1	Punjabi	S_Punjabi-2	0	0.77885
118	S_Mongola-1	Brahmin	S_Brahmin-1	0	0.77874
119	S_Mongola-1	Bougainville	S_Bougainville-2	0	0.77866
120	S_Mongola-1	Sindhi	S_Sindhi-2	0	0.77851
121	S_Mongola-1	Pathan	S_Pathan-1	0	0.77838
122	S_Mongola-1	Punjabi	S_Punjabi-4	0	0.77776
123	S_Mongola-1	Kurd-Iraq	WGS	0	0.77625
124	S_Mongola-1	Pathan	S_Pathan-2	0	0.77597
125	S_Mongola-1	Ossetian-North	S_Ossetian-1	0	0.77575
126	S_Mongola-1	Russian	S_Russian-1	0	0.77570
127	S_Mongola-1	Finnish	S_Finnish-1	0	0.77476
128	S_Mongola-1	Sindhi	S_Sindhi-1	0	0.77473
129	S_Mongola-1	Turkish-Kayseri	S_Turkish-Kayseri-1	0	0.77463
130	S_Mongola-1	Tajik	S_Tajik-2	0	0.77448
131	S_Mongola-1	YANA_UP_WGS	Yana1	0	0.77422
132	S_Mongola-1	Ossetian-North	S_Ossetian-2	0	0.77413
133	S_Mongola-1	Papuan	S_Papuan-10	0	0.77381
134	S_Mongola-1	Balochi	S_Balochi-2	0	0.77365
135	S_Mongola-1	Brahui	S_Brahui-1	0	0.77363
136	S_Mongola-1	Adygei	S_Adygei-1	0	0.77334
137	S_Mongola-1	Makrani	S_Makrani-1	0	0.77334
138	S_Mongola-1	Finnish	S_Finnish-3	0	0.77319
139	S_Mongola-1	Adygei	S_Adygei-2	0	0.77319
140	S_Mongola-1	Kalash	S_Kalash-2	0	0.77319
141	S_Mongola-1	Turkish-Kayseri	S_Turkish-Kayseri-2	0	0.77319
142	S_Mongola-1	Chechen	S_Chechen-1	0	0.77312
143	S_Mongola-1	Papuan	S_Papuan-9	0	0.77307
144	S_Mongola-1	Russian	S_Russian-2	0	0.77288
145	S_Mongola-1	Icelandic	S_Icelandic-1	0	0.77260
146	S_Mongola-1	Finnish	S_Finnish-2	0	0.77258
147	S_Mongola-1	Papuan	S_Papuan-12	0	0.77257
148	S_Mongola-1	Kalash	S_Kalash-1	0	0.77247
149	S_Mongola-1	Lezgin	S_Lezgin-1	0	0.77245
150	S_Mongola-1	Papuan	S_Papuan-8	0	0.77232
151	S_Mongola-1	Russia_Abkhasian	S_Abkhasian-1	0	0.77197
152	S_Mongola-1	Iranian-Fars	S_Iranian-Fars-1	0	0.77194
153	S_Mongola-1	Brahui	S_Brahui-2	0	0.77178
154	S_Mongola-1	Russia_Abkhasian	S_Abkhasian-2	0	0.77170
155	S_Mongola-1	Papuan	S_Papuan-1	0	0.77164
156	S_Mongola-1	Norwegian	S_Norwegian-1	0	0.77159
157	S_Mongola-1	Orcadian	S_Orcadian-2	0	0.77158
158	S_Mongola-1	Estonian	S_Estonian-1	0	0.77155
159	S_Mongola-1	Papuan	S_Papuan-7	0	0.77150
160	S_Mongola-1	Papuan	S_Papuan-11	0	0.77146
161	S_Mongola-1	Estonian	S_Estonian-2	0	0.77144
162	S_Mongola-1	Papuan	S_Papuan-13	0	0.77131
163	S_Mongola-1	Tajik	S_Tajik-1	0	0.77131
164	S_Mongola-1	Papuan	S_Papuan-14	0	0.77129
165	S_Mongola-1	Hungarian	S_Hungarian-2	0	0.77120
166	S_Mongola-1	Czech	S_Czech-2	0	0.77120
167	S_Mongola-1	Papuan	S_Papuan-3	0	0.77119
168	S_Mongola-1	Icelandic	S_Icelandic-2	0	0.77119
169	S_Mongola-1	Hungarian	S_Hungarian-1	0	0.77111
170	S_Mongola-1	Polish	S_Polish-1	0	0.77110
171	S_Mongola-1	Bulgarian	S_Bulgarian-1	0	0.77106
172	S_Mongola-1	Greek	S_Greek-1	0	0.77103
173	S_Mongola-1	Iranian-Fars	S_Iranian-Fars-2	0	0.77103
174	S_Mongola-1	Papuan	S_Papuan-5	0	0.77101
175	S_Mongola-1	French	S_French-2	0	0.77082
176	S_Mongola-1	Georgian	S_Georgian-1	0	0.77071
177	S_Mongola-1	Balochi	S_Balochi-1	0	0.77062
178	S_Mongola-1	Spanish	S_Spanish-1	0	0.77061
179	S_Mongola-1	Armenian	S_Armenian-1	0	0.77054
180	S_Mongola-1	Papuan	S_Papuan-6	0	0.77049
181	S_Mongola-1	Bergamo	S_Bergamo-2	0	0.77017
182	S_Mongola-1	Papuan	S_Papuan-2	0	0.77008
183	S_Mongola-1	Bulgarian	S_Bulgarian-2	0	0.77007
184	S_Mongola-1	Papuan	S_Papuan-4	0	0.77005
185	S_Mongola-1	Spanish	S_Spanish-2	0	0.76981
186	S_Mongola-1	Greek	S_Greek-2	0	0.76981
187	S_Mongola-1	Basque	S_Basque-1	0	0.76979
188	S_Mongola-1	English	S_English-1	0	0.76977
189	S_Mongola-1	Lezgin	S_Lezgin-2	0	0.76975
190	S_Mongola-1	Tuscan	S_Tuscan-2	0	0.76960
191	S_Mongola-1	Albanian.DG	S_Albanian1	0	0.76953
192	S_Mongola-1	English	S_English-2	0	0.76951
193	S_Mongola-1	Armenian	S_Armenian-2	0	0.76950
194	S_Mongola-1	Sardinian	S_Sardinian-2	0	0.76946
195	S_Mongola-1	Orcadian	S_Orcadian-1	0	0.76909
196	S_Mongola-1	Tuscan	S_Tuscan-1	0	0.76906
197	S_Mongola-1	Jew_Iraqi	S_Iraqi_Jew-1	0	0.76901
198	S_Mongola-1	Basque	S_Basque-2	0	0.76888
199	S_Mongola-1	Georgian	S_Georgian-2	0	0.76886
200	S_Mongola-1	Jew_Iraqi	S_Iraqi_Jew-2	0	0.76865
201	S_Mongola-1	Jordanian	S_Jordanian-3	0	0.76809
202	S_Mongola-1	French	S_French-1	0	0.76796
203	S_Mongola-1	BedouinB	S_BedouinB-2	0	0.76779
204	S_Mongola-1	Druze	S_Druze-1	0	0.76757
205	S_Mongola-1	Druze	S_Druze-2	0	0.76754
206	S_Mongola-1	Makrani	S_Makrani-2	0	0.76747
207	S_Mongola-1	Jew_Yemenite	S_Yemenite_Jew-2	0	0.76622
208	S_Mongola-1	Jew_Yemenite	S_Yemenite_Jew-1	0	0.76575
209	S_Mongola-1	Sardinian	S_Sardinian-1	0	0.76564
210	S_Mongola-1	BedouinB	S_BedouinB-1	0	0.76460
211	S_Mongola-1	Jordanian	S_Jordanian-2	0	0.76413
212	S_Mongola-1	Samaritan	S_Samaritan-1	0	0.76396
213	S_Mongola-1	Jordanian	S_Jordanian-1	0	0.76261
214	S_Mongola-1	Saharawi	S_Saharawi-2	0	0.75981
215	S_Mongola-1	Saharawi	S_Saharawi-1	0	0.75964
216	S_Mongola-1	Mozabite	S_Mozabite-1	0	0.75937
217	S_Mongola-1	Mozabite	S_Mozabite-2	0	0.75824
222	S_Mongola-1	Somali	S_Somali-1	0	0.74788
224	S_Mongola-1	Masai	S_Masai-2	0	0.74381
226	S_Mongola-1	Masai	S_Masai-1	0	0.74274
232	S_Mongola-1	Gambian	S_Gambian-2	0	0.73200
233	S_Mongola-1	BantuKenya	S_BantuKenya-1	0	0.73139
234	S_Mongola-1	Luo	S_Luo-2	0	0.73107
235	S_Mongola-1	BantuKenya	S_BantuKenya-2	0	0.73020
236	S_Mongola-1	Luhya	S_Luhya-1	0	0.73005
237	S_Mongola-1	Luhya	S_Luhya-2	0	0.73002
238	S_Mongola-1	Mandenka	S_Mandenka-2	0	0.72934
239	S_Mongola-1	Gambian	S_Gambian-1	0	0.72933
240	S_Mongola-1	Esan	S_Esan-2	0	0.72920
241	S_Mongola-1	Yoruba	S_Yoruba-2	0	0.72879
242	S_Mongola-1	Mandenka	S_Mandenka-1	0	0.72872
243	S_Mongola-1	Yoruba	S_Yoruba-1	0	0.72816
244	S_Mongola-1	Esan	S_Esan-1	0	0.72810
245	S_Mongola-1	Mende	S_Mende-1	0	0.72793
246	S_Mongola-1	Mende	S_Mende-2	0	0.72788
247	S_Mongola-1	Biaka	S_Biaka-1	0	0.72484
248	S_Mongola-1	Biaka	S_Biaka-2	0	0.72347
249	S_Mongola-1	Mbuti	S_Mbuti-3	0	0.72046
250	S_Mongola-1	Mbuti	S_Mbuti-1	0	0.72010
251	S_Mongola-1	Mbuti	S_Mbuti-2	0	0.72005
252	S_Mongola-1	Khomani_San	S_Khomani_San-2	0	0.71521
253	S_Mongola-1	Ju_hoan_North	S_Ju_hoan_North-2	0	0.71514
254	S_Mongola-1	Ju_hoan_North	S_Ju_hoan_North-3	0	0.71460
255	S_Mongola-1	Khomani_San	S_Khomani_San-1	0	0.71302

PROOF G25 distances shouldn't be trusted

The late Paleolithic African paper showed that there was Eurasian geneflow back to Africa in the Paleolithic that affected pretty much all Africans including Mbuti. In other words even Mbuti got some Eurasian genes during the Paleolithic. Least affected were Khomani and Ju-Hoan.

The IBS list I posted accurately shows this by showing Mongola closer to Mbuti than to Khomani and Ju-Hoan.

The G25 (scaled) on the other hand gets it all wrong. You can try it yourself. It wrongly shows Mongola significantly closer to Khomani-San than Mbuti ! If it gets this wrong then how should the pops be trusted.

Distance to: Mongola
0.918673 Khomani_San
0.98425066 Ju_hoan_North
0.99607508 Mbuti

Here's additional proof something is not right with the G25. Everyone should know that Eurasians such as Kurds should be closest to other Eurasians and not Africans.

G25 also wrongly shows Kurds closer to Yorubans and Esans than to Papuans which is absurd. Additionally, G25 wrongly shows Kurds closer to Sudanese than to Karitiana and Surui.

Additionally G25 wrongly shows Kurds are closer to Jordanians than Kurds to E. Europeans and Uyghur. I can go on and on with the wrong ranking in G25.

NO	Kurdish	G25 Distance to:
1	Turkish_Kayseri	0.04594
2	Armenian_B	0.04996
3	Abkhasian	0.07100
4	Adygei	0.07185
5	Chechen	0.07279
6	Jordanian	0.09159
7	Balochi	0.12169
8	Albanian	0.12363
9	Brahui	0.12457
10	Bulgarian	0.13177
11	French_Al	0.16473
12	BedouinB	0.16728
13	Hungarian	0.16929
14	Czech	0.18128
15	Basque_French	0.19215
16	Finnish	0.21537
17	Mozabite	0.23311
18	Saharawi	0.26496
19	Uygur	0.28771
20	Hazara	0.28992
21	Kirghiz	0.39622
22	Jarawa	0.42858
23	Somali	0.43369
24	Mongolian	0.46764
25	Mongola	0.55815
26	Eskimo_Sireniki	0.56139
27	Japanese	0.58489
28	Sudanese	0.69730
29	Karitiana	0.71006
30	Surui	0.71489
31	Yoruba	0.74242
32	Esan_Nigeria	0.74434
33	Papuan	0.78951
34	Khomani_San	0.83812
35	Ju_hoan_North	0.90933
36	Mbuti	0.92566

**Zoro** · 03-10-2021, 06:50 PM

Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc

NO	POPULATION	DST
1	Lezgin	0.85119
2	Armenian	0.85040
3	Adygei	0.85039
4	Abkhasian	0.85027
5	Turkish-Kayseri	0.85012
6	Chechen	0.84983
7	Czech	0.84973
8	Hungarian	0.84956
9	Bulgarian	0.84940
10	French	0.84880
11	Basque	0.84860
12	Finnish	0.84860
13	Russian	0.84855
14	Estonian	0.84832
15	Sardinian	0.84817
16	Polish	0.84797
17	Pathan	0.84782
18	Tajik	0.84777
19	Kalash	0.84722
20	Sindhi	0.84702
21	Jew_Yemenite	0.84700
22	Tlingit	0.84695
23	Balochi	0.84675
24	Brahui	0.84615
25	Brahmin	0.84608
26	Samaritan	0.84603
27	BedouinB	0.84589
28	Saami	0.84589
29	Uyghur	0.84578
30	Makrani	0.84567
31	Mansi	0.84565
32	Bengali	0.84557
33	Punjabi	0.84517
34	Hazara	0.84498
35	Kyrgyz_Kyrgyzstan	0.84454
36	Jordanian	0.84422
37	Mala	0.84288
38	Tubalar	0.84250
39	Irula	0.84181
40	Even	0.84074
41	Mongola	0.84070
42	Tu	0.84029
43	Hezhen	0.84020
44	Mixtec	0.84018
45	Yakut	0.84000
46	Burmese	0.83998
47	Mexico_Zapotec.DG	0.83971
48	Xibo	0.83970
49	Naxi	0.83951
50	Han	0.83945
51	Korean	0.83923
52	Japanese	0.83898
53	Mayan	0.83886
54	Khonda_Dora	0.83884
55	Daur	0.83884
56	Tujia	0.83882
57	Quechua	0.83881
58	Eskimo_Sireniki.DG	0.83873
59	Oroqen	0.83861
60	Ulchi	0.83859
61	Eskimo_Naukan.DG	0.83855
62	She	0.83853
63	Miao	0.83845
64	Yi	0.83844
65	Itelmen	0.83824
66	Mixe	0.83819
67	Kinh	0.83813
68	China_Lahu	0.83783
69	Pima	0.83775
70	Thai	0.83774
71	Eskimo_Chaplin.DG	0.83767
72	Cambodian	0.83766
73	YANA_UP_WGS	0.83735
74	Dai	0.83730
75	Kusunda	0.83724
76	Piapoco	0.83703
77	Ami.DG	0.83696
78	Karitiana	0.83687
79	Surui	0.83654
80	Igorot	0.83649
81	Dusun	0.83639
82	Saharawi	0.83398
83	Mozabite	0.83287
84	Bougainville	0.83084
85	Papuan	0.82871
86	Somali	0.81444
87	Masai	0.80654
88	BantuKenya	0.79064
89	Luo	0.79045
90	Gambian	0.78966
91	Luhya	0.78919
92	Mandenka	0.78855
93	Esan	0.78710
94	Mende	0.78708
95	Yoruba	0.78690
96	Biaka	0.78118
97	Mbuti	0.77853
98	Ju_hoan_North	0.77354
99	Khomani_San	0.77330

**Lucas** · 03-10-2021, 09:00 PM

Originally Posted by Zoro

Unlike G25 the Plink IBS gene to gene comparison correctly shows Kurds closer to other Eurasians (Papuans, Karitiana, Surui) than to SSA. It also correctly shows Kurds closer to E. Europeans, Baloch, Brahui, Hazara and Uyghur than to Jordanians etc, etc

Zoro, but you somewhat compare apples to oranges. List of euclidean distances based on PCA values, and direct gene-to-gene comparison.
Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.

**Zoro** · 03-10-2021, 10:27 PM

Originally Posted by Lucas

Zoro, but you somewhat compare apples to oranges. List of euclidean distances based on PCA values, and direct gene-to-gene comparison.
Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.

One way to re-word what you just said is one to one gene to gene comparison using IBS is more accurate method than G25 or Admixture calculator in determining genetic similarity between 2 pops say Kurds and Bulgarians or Mongolians.

I'm reminded of something Dilawer told me a while back. He said Admixture or PCA based methods don't accurately portray genetic similarity between 2 populations like one to one IBS comparison. They just cluster based on geography and not based on genes. That's partly the reason why individuals in a population have all sorts of phenotypes but Admixture or PCA still clusters them together.

Although PCA or Admixture clusters Kurds or Poles within clusters, if one does IBS on individual Poles or Kurds then they may show widely differing results with regards to genetic similarity with Siberians or E. Asians depending on which components the calculator uses or what samples the G25 PCA used. By contrast, IBS results are not depending on this stuff and have no relevance to what samples are used.

This may in fact be more closely aligned with their phenotypes than G25 or Admixture results which would cluster the Poles or Kurds within clusters and these clusters would not explain their individualistic phenotypes like IBS would explain.

**~~Komintasavalta~~** · 03-10-2021, 11:43 PM

Originally Posted by Lucas

You created merged dataset or you find it somewhere?

It's from this post by Razib Khan: https://www.gnxp.com/WordPress/2018/...n-one-command/.

Originally Posted by Lucas

Even if IBS would be better for distances between pops, you can't make admixture breakdown using it which most people likes.

Khvorykh et al. 2020 even did admixture-style analysis based on the number of shared IBD segments: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7696950/:

The fourth stage of our computations is unique to this research and was absent in Fedorova et al. 2016. In this stage, we created Supplementary Table S4 using the program rankingATLAS2_v9.pl, and the data from the Supplementary Table S1 ("IBD Normalized Numbers"). Supplementary Table S4 presents the percentages of relative relatedness of each population to the nine Distinct Human Genetic Regions (DHGRs) (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE, see Results section). For each population (e.g., Georgia) the program counts the numbers of shared IBD fragments per pair of individuals for this population with the three representatives of DHGR region and then makes a sum of these three numbers. For example, the for the AFE region, the summing number of shared IBDs will be the following: 0.48 IBDs (per pair for Georgia vs. LWK) + 0.92 (Georgia vs. Din_AFR) + 3.12 (Georgia vs. Mas_AFR) = 4.52 (for the AFE group). And so on for each DHGR group. In order to minimize the Founder effect in our calculations, we created an upper threshold of 100 shared IBD segments for any populational pair. For example, in a calculation of Congo (Con_AFR) vs. LWK, the original value was 151.9, however, with the threshold in place, the program changed the value to 100). Finally, we calculated the relative percentages for all 9 components (AFE, AFW, AMR, EUR, ARC, EAS, OCE, SAS, and MDE) in a way that ensured their sum was always 100%. Ranking data for each population (as presented in Table 2) were also obtained by rankingATLAS2_v9.pl.

Here's a graph I made of some populations from Khvorykh's table S4:

Code:

curl -Ls pastebin.com/raw/BmNdqWvi|tr -d \\r>/tmp/tables4
printf %s\\n Sau_MDE Ira_MDE Rom_EUR Gre_EUR Ger_EUR GBR_EUR Swe_EUR Lat_EUR Rus_EUR Est_EUR Fin_EUR FIN_EUR Ing_EUR Kar_EUR Vep_EUR Saa_EUR Mor_EUR Kom_EUR Udm_EUR Mar_EUR Mis_EUR Kry_EUR Tat_EUR Chu_EUR BSh_EUR Man_SIB Kha_SIB Tun_SIB For_SIB Nen_SIB  Nga_SIB Bur_SIB Yak_SIB Ale_ARC>/tmp/pop
awk -F, 'NR==1{print;next}NR==FNR{a[$1]=$0;next}$1 in a{print a[$1]}' /tmp/tables4 /tmp/pop|awk -F, -v OFS=, '{print$2,$6,$11,$10,$7,$8,$5,$9,$3,$4}'>/tmp/a
R -e 'library("ggplot2")
library("reshape2");

t=read.csv("/tmp/a",header=T,check.names=F)

t2=melt(t,id.var="Population")

lab=round(t2$value)
lab[lab<=2]=""
t2$lab=lab
t2$value=t2$value/100

ggplot(t2,aes(x=fct_rev(factor(Population,level=unique(Population))),y=value,fill=variable))+
geom_bar(stat="identity",width=1,position=position_fill(reverse=T))+
geom_text(aes(label=lab),position=position_stack(vjust=.5,reverse=T),size=2.5)+
coord_flip()+
theme(
  axis.text=element_text(color="black"),
  axis.text.x=element_blank(),
  axis.ticks=element_blank(),
  axis.title.x=element_blank(),
  legend.margin=margin(0),
  legend.title=element_blank(),
  panel.background=element_rect(fill="white"),
)+
xlab("")+
scale_x_discrete(expand=c(0,0))+
scale_y_discrete(expand=c(0,0))+
ggsave("/tmp/a.png",width=6,height=7)'

The proportion of the Northern European component was defined based on the number of shared IBD segments with Estonians, Germans, and Swedes. So for example Swedes have a higher proportion of the Northern European component than Latvians.

**~~Komintasavalta~~** · 03-11-2021, 12:14 AM

BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread....ean-bias/page2):

G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).

However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.

It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.

For example here I generated a 12 by 12 matrix of FST distances:

Code:

R -e 'library(admixtools);
f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])};
fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG"));
write.csv(round(f2m(fst),6),"fst",quote=F)'
$ cat fst
,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khomani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG,Turkey_N.DG,Yoruba.DG
Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.208092,0.055175,0.200832,0.264921,0.19757,0.037891
Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.027304,0.243293,0.020451,0.188681,0.138516,0.189624
Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0.102589,0.211397,0.089601,0.188651,0.03734,0.156253
Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034955,0.255676,0.102751,0.247671,0.311007,0.244202,0.108353
Khomani_San.DG,0.093686,0.269936,0.236001,0.034955,0,0.264307,0.110281,0.256679,0.319966,0.253402,0.115599
Korean.DG,0.208092,0.027304,0.102589,0.255676,0.264307,0,0.238141,0.001142,0.178226,0.136865,0.184756
Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110281,0.238141,0,0.230583,0.294664,0.228177,0.077978
Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.256679,0.001142,0.230583,0,0.171326,0.130389,0.176566
Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.319966,0.178226,0.294664,0.171326,0,0.215617,0.241977
Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.253402,0.136865,0.228177,0.130389,0.215617,0,0.172992
Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.115599,0.184756,0.077978,0.176566,0.241977,0.172992,0

Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:

Code:

$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)'
                        [,1]         [,2]          [,3]
Biaka.DG          0.09458067 -0.009318035  0.0007634203
Even.DG          -0.10587237  0.033672133 -0.0493091783
Finnish.DG       -0.06971126  0.039180919  0.0443036464
Ju_hoan_North.DG  0.14384037 -0.005407783 -0.0079752958
Khomani_San.DG    0.15305612 -0.005072182 -0.0095401289
Korean.DG        -0.10263674  0.022172427 -0.0479094108
Mbuti.DG          0.12082958 -0.006742200 -0.0017669591
Mongola.DG       -0.09712661  0.017649424 -0.0402117613
Papuan.DG        -0.13332805 -0.137725617  0.0231446908
Turkey_N.DG      -0.07026603  0.060365299  0.0804792633
Yoruba.DG         0.06663432 -0.008774385  0.0080217135

Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:

Code:

$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3);sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2))
[1] 0.2110375

With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.

**Zoro** · 03-11-2021, 01:53 AM

Originally Posted by Komintasavalta

BTW what was G25 made with? The AG user anglesqueville said it was made with SmartPCA (https://anthrogenica.com/showthread....ean-bias/page2):

G25 is not a so-called "calculator", it is a PCA calculated directly on a large "raw data" database (of allele readings) using a well-known program (smartpca, Eigensoft package, Nick Patterson).

However when I tried googling "smartpca site:eurogenes.blogspot.com", there were only two hits, neither of which even matched text written by Davidski.

It's possible to encode a 10,000 by 10,000 matrix of distances between populations as a 10,000 by 25 matrix where the columns are PC components. Then you can retrieve the original distances between two rows of the table fairly accurately by calculating the Euclidean distance between the rows.

For example here I generated a 12 by 12 matrix of FST distances:

Code:

R -e 'library(admixtools);
f2m=function(x){t=as.data.frame(x[,1:3]);t2=rbind(t,setNames(t[,c(2,1,3)],names(t)));xtabs(t2[,3]~t2[,2]+t2[,1])};
fst=fst("g/v44.3_1240K_public/v44.3_1240K_public",c("Biaka.DG","Even.DG","Finnish.DG","Ju_hoan_North.DG","Khomani_San.DG","Korean.DG","Mbuti.DG","Mongola.DG","Papuan.DG","Turkey_N.DG","Yoruba.DG"));
write.csv(round(f2m(fst),6),"fst",quote=F)'
$ cat fst
,Biaka.DG,Even.DG,Finnish.DG,Ju_hoan_North.DG,Khomani_San.DG,Korean.DG,Mbuti.DG,Mongola.DG,Papuan.DG,Turkey_N.DG,Yoruba.DG
Biaka.DG,0,0.212276,0.182032,0.086521,0.093686,0.208092,0.055175,0.200832,0.264921,0.19757,0.037891
Even.DG,0.212276,0,0.099165,0.260155,0.269936,0.027304,0.243293,0.020451,0.188681,0.138516,0.189624
Finnish.DG,0.182032,0.099165,0,0.22675,0.236001,0.102589,0.211397,0.089601,0.188651,0.03734,0.156253
Ju_hoan_North.DG,0.086521,0.260155,0.22675,0,0.034955,0.255676,0.102751,0.247671,0.311007,0.244202,0.108353
Khomani_San.DG,0.093686,0.269936,0.236001,0.034955,0,0.264307,0.110281,0.256679,0.319966,0.253402,0.115599
Korean.DG,0.208092,0.027304,0.102589,0.255676,0.264307,0,0.238141,0.001142,0.178226,0.136865,0.184756
Mbuti.DG,0.055175,0.243293,0.211397,0.102751,0.110281,0.238141,0,0.230583,0.294664,0.228177,0.077978
Mongola.DG,0.200832,0.020451,0.089601,0.247671,0.256679,0.001142,0.230583,0,0.171326,0.130389,0.176566
Papuan.DG,0.264921,0.188681,0.188651,0.311007,0.319966,0.178226,0.294664,0.171326,0,0.215617,0.241977
Turkey_N.DG,0.19757,0.138516,0.03734,0.244202,0.253402,0.136865,0.228177,0.130389,0.215617,0,0.172992
Yoruba.DG,0.037891,0.189624,0.156253,0.108353,0.115599,0.184756,0.077978,0.176566,0.241977,0.172992,0

Classical multidimensional scaling (MDS) produces identical coordinates with PCA, but the difference is that it takes a distance matrix as an input. I used MDS to reduce the distance matrix to three principal components:

Code:

$ R -e 't=read.csv("fst",row.names=1,header=T);cmdscale(as.dist(t),k=3)'
                        [,1]         [,2]          [,3]
Biaka.DG          0.09458067 -0.009318035  0.0007634203
Even.DG          -0.10587237  0.033672133 -0.0493091783
Finnish.DG       -0.06971126  0.039180919  0.0443036464
Ju_hoan_North.DG  0.14384037 -0.005407783 -0.0079752958
Khomani_San.DG    0.15305612 -0.005072182 -0.0095401289
Korean.DG        -0.10263674  0.022172427 -0.0479094108
Mbuti.DG          0.12082958 -0.006742200 -0.0017669591
Mongola.DG       -0.09712661  0.017649424 -0.0402117613
Papuan.DG        -0.13332805 -0.137725617  0.0231446908
Turkey_N.DG      -0.07026603  0.060365299  0.0804792633
Yoruba.DG         0.06663432 -0.008774385  0.0080217135

Then even though there are only 3 principal components, I can still retrieve the original distance between a pair of populations fairly accurately:

Code:

$ R -e 't=read.csv("fst",row.names=1,header=T);c=cmdscale(as.dist(t),k=3);sqrt(sum((c["Biaka.DG",]-c["Even.DG",])^2))
[1] 0.2110375

With 25 components, it's possible to encode the distances even between tens of thousands of populations more or less accurately. If more components would be necessary, you could just as well make a G50 or G100 or something.

Very good. You're thinking out of the box!. Yes of course you can make a calculator based on FST or IBS. You can do IBS between target and WHG, ENF, ANS, etc and even square the individual results to create bigger differences between target and assign each a prorated proportion of 100%.

At least it wouldn't have the biases and variability of results like G25 or Admixture where the results depend on the other samples in the runs.