PDA

View Full Version : Behar Romanians: Weirdest Academic Samples



Ion Basescul
10-02-2020, 05:26 PM
These guys were collected back in 2010 as part of this study (https://www.nature.com/articles/nature09103).
Some of you might have seen graphs like these in which they are featured.

https://i.ibb.co/qrWwgWw/image.png

https://2.bp.blogspot.com/_Ish7688voT0/TA_8VX3jGkI/AAAAAAAACcM/HVkOLdPm94g/s1600/admixture-global.jpg

https://1.bp.blogspot.com/_Ish7688voT0/TBDgV2r3hxI/AAAAAAAACck/sYi1shNB8bc/s1600/westeurasianpca.jpg

There are two Gypsies/Romas in the 16 sample dataset, as you can see on the PCA above, but those score as expected in Eurogenes K13.

It's the Romanians that have weird results. Some of them don't look out of the ordinary, while others have an unbalanced distribution between North Atlantic and Baltic. They don't look like any other population that I know of in Europe and beyond.

Behold the 14 Romanians
Blue: those that score like regular Romaniasn
Red: impossible mix, or I've no idea what populations you need to mix to get such a large discrepancy between North Atlantic and Baltic
Green: Jewish or Greek immediate family

<google-sheets-html-origin style="color: rgb(0, 0, 0); font-size: medium;">
<tbody>
Name
N_Atlantic
Baltic
West_Med
West_Asian
East_Med
Red_Sea
South_Asian
East_Asian
Siberian
Amerindian
Oceanian
NE_African
Sub-Saharan
N_Atlantic+Baltic



Romania1.txt
24.66
23.90
16.52
10.04
19.87
3.28
0.00
0.48
0.68
0.00
0.58
0.00
0.00
48.56


Romania2.txt
14.94
30.25
18.10
12.55
17.83
1.55
1.44
1.79
0.00
0.80
0.37
0.00
0.40
45.19


Romania4.txt
13.21
35.01
20.29
9.91
14.14
4.96
0.00
1.24
0.40
0.28
0.56
0.00
0.00
48.22


Romania5.txt
7.59
40.42
12.72
3.67
32.75
0.93
0.00
0.00
1.72
0.00
0.00
0.20
0.00
48.01


Romania6.txt
15.01
27.66
21.48
7.86
23.12
0.00
1.98
0.00
0.66
1.17
1.05
0.00
0.00
42.67


Romania8.txt
21.00
25.37
16.21
13.17
19.20
2.18
0.90
0.15
0.81
0.91
0.00
0.09
0.00
46.37


Romania9.txt
23.61
24.26
17.82
3.69
27.06
0.00
1.31
0.42
0.57
1.02
0.23
0.00
0.00
47.87


Romania10.txt
17.69
34.20
13.56
6.35
24.82
0.78
0.43
0.00
1.03
0.89
0.00
0.00
0.24
51.89


Romania11.txt
27.41
24.25
18.52
10.78
15.36
0.83
0.22
0.00
1.45
0.85
0.00
0.00
0.33
51.66


Romania12.txt
7.12
34.64
22.99
11.43
22.03
0.00
0.00
0.69
0.43
0.47
0.20
0.00
0.00
41.76


Romania13.txt
19.21
32.76
21.60
6.80
14.84
2.37
0.00
0.00
0.35
1.51
0.56
0.00
0.00
51.97


Romania14.txt
5.29
35.53
25.56
6.65
23.24
0.00
2.14
0.18
1.06
0.06
0.30
0.00
0.00
40.82


Romania15.txt
6.62
35.85
26.08
13.27
14.14
1.10
0.00
0.00
2.25
0.13
0.56
0.00
0.00
42.47


Romania16.txt
13.64
40.86
16.74
7.82
17.75
0.00
0.91
0.00
1.57
0.62
0.10
0.00
0.00
54.50

</tbody>
</google-sheets-html-origin>
2 full Romas/Gypsies, which look normal compared to the others that I have seen

<google-sheets-html-origin style="color: rgb(0, 0, 0); font-size: medium;">
<tbody>
Name
N_Atlantic
Baltic
West_Med
West_Asian
East_Med
Red_Sea
South_Asian
East_Asian
Siberian
Amerindian
Oceanian
NE_African
Sub-Saharan


Romania3_Roma.txt
14.91
11.19
11.66
15.92
16.75
3.86
22.30
1.08
0.98
0.00
0.72
0.64
0.00


Romania7_Roma.txt
3.94
8.70
12.96
16.67
23.85
2.36
27.17
2.57
0.00
0.39
1.40
0.00
0.00

</tbody>
</google-sheets-html-origin>
I have asked the data manager at the Estonian Biocentre to confirm the localities from where they were collected, since that is not mentioned in the study.

Leto
10-02-2020, 05:33 PM
Yes, some of them are straight up impossible. By the way, the Roma are also weird, see the North Atlantic difference between them.

vbnetkhio
10-02-2020, 05:33 PM
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.

Leto
10-02-2020, 05:35 PM
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.
Behar was on Dodecad, I had it replaced with a more realistic average. Dodecad is full of such weird-ass averages.

Seya
10-02-2020, 05:38 PM
I've seen such results before. even with 2% north atlantic

Ion Basescul
10-02-2020, 05:43 PM
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.

Does that even matter? Aren't they just compared against a single result, which is the average with like 24% NA and 24% Baltic?

WeirdLookingFellow
10-02-2020, 05:57 PM
Why are these studies not even using 30 samples? I don't know about Genetics 101 but Statistics 101 says that if you don't have 30 samples you might as well give up, you can't run more than a student t-test and expect the results to be worth anything.

I get that the Baltic + N Atlantic ends up being in the Basescu average more or less but this is ridiculous, at least for me.

vbnetkhio
10-02-2020, 06:04 PM
Does that even matter? Aren't they just compared against a single result, which is the average with like 24% NA and 24% Baltic?

i mean the reference samples for sourcing the components. The "Baltic", "North Atlantic" etc, are partially built from those 5 samples' dna.

when you run reference samples trough the calculator which was built from those very samples , they get weird results like these. that's the calculator effect.

i think 6, 10 and 16 are also probably reference samples.

Ion Basescul
10-02-2020, 06:19 PM
I've seen such results before. even with 2% north atlantic

2% is impossible even for full Romas. The problem is the difference between Baltic and North Atlantic, because the Baltic source also carries a lot of Atlantic. 35% Baltic and 5% NA is impossible to reproduce with modern populations.

Seya
10-02-2020, 06:22 PM
2% is impossible even for full Romas. The problem is the difference between Baltic and North Atlantic, because the Baltic source also carries a lot of Atlantic.

it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.

Peterski
10-02-2020, 06:23 PM
Some of them are mixed with Gypsies IIRC.

Ion Basescul
10-02-2020, 06:24 PM
it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.

Probably some kind of damaged kits.

Ion Basescul
10-02-2020, 06:25 PM
Some of them are mixed with Gypsies IIRC.

It looks like none are. There are just 2 full Gypsies and 14 unmixed Romanians, out of whom some have very weird differences between North Atlantic and Baltic.

vbnetkhio
10-02-2020, 06:25 PM
it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.

those were academic samples uploaded by Ajeje Brazorf, including these Romanians. some others on the list also suffered from the calcualtor effect.

https://www.theapricity.com/forum/showthread.php?245352-Random-GEDmatch-kits

they are deleted from gedmatch now.

Ion Basescul
10-02-2020, 06:28 PM
those were academic samples uploaded by Ajeje Brazorf, including these Romanians. some others on the list also suffered from the calcualtor effect.

https://www.theapricity.com/forum/showthread.php?245352-Random-GEDmatch-kits

they are deleted from gedmatch now.

Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.

Leto
10-02-2020, 06:35 PM
Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.
Maybe MDLP K16?

Peterski
10-02-2020, 06:36 PM
It looks like none are.

OK maybe those were Romanians from another study that were mixed with Gypsies.

Ion Basescul
10-02-2020, 06:42 PM
Maybe MDLP K16?

Yeah, vbnkethio is right. Those with weird results are being used as references for Eurogenes K13, resulting in a calculator effect.

Romania5 (looks like a person from Muntenia)

puntDNAL K13


# Population Percent
1 NE_Europe 36.66
2 SW_Europe 30.04
3 West_Asia 18.95
4 SW_Asia 9.93
5 Siberia 1.84
6 SE_Asia 0.89
7 East_Africa 0.8
8 Americas 0.32
9 West_Africa 0.29
10 South_Africa 0.21
11 Oceania 0.06


Single Population Sharing:


# Population (source) Distance
1 Romanian 4.58
2 Montenegrin 5.2
3 Serbian 6.27
4 Bulgarian 6.95
5 Macedonian 7.26
6 Bosnian 7.82
7 Moldavian 8.73
8 Croatian 10.4
9 Kosovar 11
10 Albanian 11.82
11 Greek_Thessaly 12.37
12 Slovene 12.43
13 Hungarian 12.43
14 Italian_Tuscan 13.49
15 Slovak 13.51
16 German_South 14.68
17 French 14.94
18 Italian_Bergamo 15.38
19 Belgian 15.61
20 Greek_Central 16.12


Mixed Mode Population Sharing:


# Primary Population (source) Secondary Population (source) Distance
1 62.7% Greek_Central + 37.3% Russian @ 1.31
2 60.7% Greek_Central + 39.3% Mordovian @ 1.54
3 70.8% Greek_Central + 29.2% Finnish @ 1.61
4 56.2% Ashkenazy_Jew + 43.8% Russian @ 1.8
5 54.1% Ashkenazy_Jew + 45.9% Mordovian @ 1.85
6 73.2% Slovak + 26.8% Lebanese_Muslim @ 2.04
7 72.9% Slovak + 27.1% Lebanese_Druze @ 2.07
8 72.3% Slovak + 27.7% Syrian @ 2.14
9 68.8% Slovene + 31.2% Turkish @ 2.15
10 67% Slovak + 33% Turkish @ 2.2
11 54.6% Italian_Sicilian + 45.4% Russian @ 2.21
12 58.5% Belarusian + 41.5% Cypriot @ 2.22
13 71.1% Slovene + 28.9% Turkish_Kayseri @ 2.25
14 52.5% Italian_Sicilian + 47.5% Mordovian @ 2.26
15 73% Slovak + 27% Lebanese_Christian @ 2.3
16 67.5% German_North + 32.5% Lebanese_Muslim @ 2.31
17 61.3% Swedish + 38.7% Lebanese_Muslim @ 2.32
18 77.4% Bosnian + 22.6% Turkish_Aydin @ 2.34
19 69.3% Slovak + 30.7% Turkish_Kayseri @ 2.35
20 60.9% Swedish + 39.1% Lebanese_Druze @ 2.37


MDLP K16


# Population Percent
1 Caucasian 36.24
2 Neolithic 24.11
3 NorthEastEuropean 19.98
4 Steppe 14.11
5 NearEast 3.46
6 Siberian 1.01
7 SouthEastAsian 0.5
8 NorthAfrican 0.43
9 Oceanic 0.1
10 Amerindian 0.07


Single Population Sharing:


# Population (source) Distance
1 Bulgarian (Bulgaria) 3.92
2 Bulgarian (Bulgaria) 4.32
3 Gagauz (Gagauzia) 4.75
4 Macedonian (Macedonia) 5.45
5 Kosovar (Kosovo) 6.04
6 Greek (Thessaloniki) 6.43
7 Albanian (Albania) 6.48
8 Greek (Greece) 6.9
9 Romanian (Gorj) 7.07
10 Montenegrian (Montenegro) 7.12
11 Greek (Peloponnes) 7.2
12 Romanian (Romania) 7.21
13 Serbian (Serbia) 7.66
14 Romanian (Apuseni) 7.78
15 Greek (Macedonia) 7.8
16 Moldavian (Molodva) 8.66
17 Serbian (Bosnia-Herzegovina) 9.37
18 Italian (Friul) 9.52
19 Bosnian (Bosnia-Herzegovina) 9.9
20 Italian (Abruzzo) 10.03


Mixed Mode Population Sharing:


# Primary Population (source) Secondary Population (source) Distance
1 52.5% Greek (Greece) + 47.5% Hungarian (Budapest) @ 1.94
2 66.6% Greek (Greece) + 33.4% Lithuanian (Lithuania) @ 2.13
3 63.1% Greek (Greece) + 36.9% Belarusian (Belarus) @ 2.16
4 67.9% Greek (Greece) + 32.1% Latvian (Latvia) @ 2.2
5 54.7% Greek (Greece) + 45.3% Hungarian (Hungary) @ 2.32
6 91.1% Bulgarian (Bulgaria) + 8.9% Jew (Georgia) @ 2.33
7 59.1% Greek (Greece) + 40.9% Ukrainian (Ukraine) @ 2.35
8 89.7% Bulgarian (Bulgaria) + 10.3% Druze (Mount_Carmel) @ 2.35
9 62.2% Greek (Greece) + 37.8% Pole (Poland) @ 2.36
10 68.9% Greek (Greece) + 31.1% Latvian_Dobele (Dobele) @ 2.38
11 91% Bulgarian (Bulgaria) + 9% Armenian (Armenia) @ 2.39
12 68.4% Greek (Greece) + 31.6% Estonian (Estonia) @ 2.39
13 62.4% Greek (Greece) + 37.6% Ukrainians_east (EastUkraine) @ 2.48
14 58.8% Greek (Greece) + 41.2% Ukrainians_west (WestUkraine) @ 2.49
15 91.7% Bulgarian (Bulgaria) + 8.3% Turk (Trabzon) @ 2.49
16 60.4% Greek (Greece) + 39.6% Belarusian_West (WestBelarus) @ 2.49
17 56.5% Serbian (Bosnia-Herzegovina) + 43.5% Greek (Greece) @ 2.52
18 67.1% Greek (Athens) + 32.9% Russians-West (WestRussian) @ 2.52
19 88.7% Bulgarian (Bulgaria) + 11.3% Cypriot (Cyprus) @ 2.53
20 82.2% Serbian (Serbia) + 17.8% Turk (Trabzon) @ 2.54




Compare that to Eurogenes K13 :lol:


# Population Percent
1 Baltic 40.44
2 East_Med 32.72
3 West_Med 12.73
4 North_Atlantic 7.6
5 West_Asian 3.66
6 Siberian 1.72
7 Red_Sea 0.93
8 Northeast_African 0.2


Single Population Sharing:


# Population (source) Distance
1 Bulgarian 23.51
2 Romanian 25.08
3 Moldavian 25.44
4 Serbian 25.87
5 Greek_Thessaly 26.6
6 Croatian 26.82
7 Southwest_Russian 28.72
8 Ashkenazi 29.18
9 Ukrainian 29.24
10 Ukrainian_Lviv 29.27
11 Ukrainian_Belgorod 29.43
12 Hungarian 30.03
13 South_Polish 30.4
14 Erzya 30.96
15 Central_Greek 31.07
16 Estonian_Polish 31.12
17 Russian_Smolensk 31.58
18 Belorussian 31.68
19 East_Sicilian 31.69
20 Polish 32.17


Mixed Mode Population Sharing:


# Primary Population (source) Secondary Population (source) Distance
1 56% Lithuanian + 44% Lebanese_Christian @ 17.49
2 60.2% Erzya + 39.8% Lebanese_Christian @ 17.53
3 59.7% Erzya + 40.3% Samaritan @ 17.63
4 59.4% Erzya + 40.6% Lebanese_Druze @ 17.69
5 55.5% Lithuanian + 44.5% Samaritan @ 17.71
6 63% Southwest_Russian + 37% Lebanese_Christian @ 17.76
7 60.1% Estonian_Polish + 39.9% Lebanese_Christian @ 17.77
8 55.2% Lithuanian + 44.8% Lebanese_Druze @ 17.77
9 62.6% Southwest_Russian + 37.4% Samaritan @ 17.88
10 59.5% Belorussian + 40.5% Lebanese_Christian @ 17.95
11 59.3% Estonian_Polish + 40.7% Lebanese_Druze @ 17.96
12 59.6% Estonian_Polish + 40.4% Samaritan @ 17.99
13 62.3% Southwest_Russian + 37.7% Lebanese_Druze @ 18.01
14 64.2% Erzya + 35.8% Yemenite_Jewish @ 18.13
15 59.1% Belorussian + 40.9% Samaritan @ 18.14
16 66.9% Southwest_Russian + 33.1% Yemenite_Jewish @ 18.17
17 58.8% Belorussian + 41.2% Lebanese_Druze @ 18.25
18 57.3% Erzya + 42.7% Cyprian @ 18.34
19 62.4% Ukrainian_Belgorod + 37.6% Lebanese_Christian @ 18.43
20 62% Ukrainian_Belgorod + 38% Samaritan @ 18.5

Ion Basescul
10-02-2020, 06:43 PM
OK maybe those were Romanians from another study that were mixed with Gypsies.

It's these ones. There are only 2 datasets as far as I am aware. David Reich's from Gorj and Alba and these ones from Behar, where Behar is mixing full Gypsies with Romanians in the PCA and other analyses.

vbnetkhio
10-02-2020, 06:49 PM
Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.

mdlp k11,it was built from ancient sampels only.

it doesn't have a proper oracle but you can compare to this one i started building:


British,0.00,1.00,0.00,0.00,1.67,33.37,1.22,26.00, 0.00,0.00,36.10
Ukrainian,0.01,0.52,0.14,2.28,1.46,26.17,0.23,26.4 6,0.37,1.17,41.14
Serb,0.01,0.30,0.08,6.66,2.54,34.54,0.18,25.20,0.3 2,0.51,29.66
Polish,0.02,0.62,0.07,2.93,1.52,27.53,0.09,27.14,0 .11,0.17,39.73
Hungarian,0.00,0.00,0.00,5.07,0.00,32.98,0.56,25.1 1,0.00,0.10,36.19
Greek_mainland,0.10,0.29,0.27,14.03,3.47,38.15,0.2 1,25.01,0.30,0.20,17.96
Montenegrin,0.00,0.41,0.21,7.55,3.03,36.73,0.06,24 .66,0.11,0.42,26.81
Moldavian,0.29,0.22,0.33,5.45,2.06,29.13,0.23,25.7 5,1.16,0.84,34.39
Early Slav,1.16,0.85,0.08,0.60,1.24,28.47,0.00,25.44,0.4 1,0.39,41.35
German_East,0.00,0.00,0.00,0.00,0.73,30.66,0.00,26 .97,0.00,0.79,40.86
Croat,0.00,0.32,0.10,6.33,1.66,33.56,0.34,25.47,0. 19,0.47,31.55
Albanian,0.15,0.21,0.12,12.22,3.38,41.07,0.33,24.4 5,0.29,0.32,17.47

Ion Basescul
10-02-2020, 06:55 PM
mdlp k11,it was built from ancient sampels only.

it doesn't have a proper oracle but you can compare to this one i started building:


British,0.00,1.00,0.00,0.00,1.67,33.37,1.22,26.00, 0.00,0.00,36.10
Ukrainian,0.01,0.52,0.14,2.28,1.46,26.17,0.23,26.4 6,0.37,1.17,41.14
Serb,0.01,0.30,0.08,6.66,2.54,34.54,0.18,25.20,0.3 2,0.51,29.66
Polish,0.02,0.62,0.07,2.93,1.52,27.53,0.09,27.14,0 .11,0.17,39.73
Hungarian,0.00,0.00,0.00,5.07,0.00,32.98,0.56,25.1 1,0.00,0.10,36.19
Greek_mainland,0.10,0.29,0.27,14.03,3.47,38.15,0.2 1,25.01,0.30,0.20,17.96
Montenegrin,0.00,0.41,0.21,7.55,3.03,36.73,0.06,24 .66,0.11,0.42,26.81
Moldavian,0.29,0.22,0.33,5.45,2.06,29.13,0.23,25.7 5,1.16,0.84,34.39
Early Slav,1.16,0.85,0.08,0.60,1.24,28.47,0.00,25.44,0.4 1,0.39,41.35
German_East,0.00,0.00,0.00,0.00,0.73,30.66,0.00,26 .97,0.00,0.79,40.86
Croat,0.00,0.32,0.10,6.33,1.66,33.56,0.34,25.47,0. 19,0.47,31.55
Albanian,0.15,0.21,0.12,12.22,3.38,41.07,0.33,24.4 5,0.29,0.32,17.47

Yep, I took a quick look with Romania4 and Romania5 who had issues and they work fine here.


<tbody>
Distance to:
Romania4


2.88749026
Montenegrin


5.03665564
Serb


7.00624008
Croat


9.79695871
Greek_mainland


10.55597461
Albanian


11.56684054
Moldavian


11.83931586
Hungarian


14.24914383
British


17.63675140
Polish


18.62794138
German_East


19.32760720
EarlySlav


19.47360778
Ukrainian

</tbody>


<tbody>
Target: Romania4
Distance: 1.9156% / 1.91564203 | ADC: 0.25x




47.2
Montenegrin


23.6
Greek_mainland


20.6
Serb


5.6
Moldavian


3.0
Croat


</tbody>


<tbody>
Distance to:
Romania5


7.33466427
Montenegrin


8.08181292
Greek_mainland


8.80423762
Serb


10.34410460
Croat


10.47838251
Albanian


13.61249059
Moldavian


14.99327516
Hungarian


18.08355054
British


19.78615930
Polish


21.54866817
Ukrainian


21.75220678
German_East


22.03236937
EarlySlav

</tbody>


<tbody>
Target: Romania5
Distance: 3.1267% / 3.12672721 | ADC: 0.25x




64.8
Greek_mainland


35.2
Moldavian

</tbody>

Dirdepo
10-02-2020, 07:52 PM
Vlach gonna break the calculator fam, that just how goes

Leto
10-02-2020, 09:27 PM
Vlach gonna break the calculator fam, that just how goes
Cumansky, you've been identified and reported, just so you know.