View Full Version : Behar Romanians: Weirdest Academic Samples
Ion Basescul
10-02-2020, 05:26 PM
These guys were collected back in 2010 as part of this study (https://www.nature.com/articles/nature09103).
Some of you might have seen graphs like these in which they are featured.
https://i.ibb.co/qrWwgWw/image.png
https://2.bp.blogspot.com/_Ish7688voT0/TA_8VX3jGkI/AAAAAAAACcM/HVkOLdPm94g/s1600/admixture-global.jpg
https://1.bp.blogspot.com/_Ish7688voT0/TBDgV2r3hxI/AAAAAAAACck/sYi1shNB8bc/s1600/westeurasianpca.jpg
There are two Gypsies/Romas in the 16 sample dataset, as you can see on the PCA above, but those score as expected in Eurogenes K13.
It's the Romanians that have weird results. Some of them don't look out of the ordinary, while others have an unbalanced distribution between North Atlantic and Baltic. They don't look like any other population that I know of in Europe and beyond.
Behold the 14 Romanians
Blue: those that score like regular Romaniasn
Red: impossible mix, or I've no idea what populations you need to mix to get such a large discrepancy between North Atlantic and Baltic
Green: Jewish or Greek immediate family
<google-sheets-html-origin style="color: rgb(0, 0, 0); font-size: medium;">
<tbody>
Name
N_Atlantic
Baltic
West_Med
West_Asian
East_Med
Red_Sea
South_Asian
East_Asian
Siberian
Amerindian
Oceanian
NE_African
Sub-Saharan
N_Atlantic+Baltic
Romania1.txt
24.66
23.90
16.52
10.04
19.87
3.28
0.00
0.48
0.68
0.00
0.58
0.00
0.00
48.56
Romania2.txt
14.94
30.25
18.10
12.55
17.83
1.55
1.44
1.79
0.00
0.80
0.37
0.00
0.40
45.19
Romania4.txt
13.21
35.01
20.29
9.91
14.14
4.96
0.00
1.24
0.40
0.28
0.56
0.00
0.00
48.22
Romania5.txt
7.59
40.42
12.72
3.67
32.75
0.93
0.00
0.00
1.72
0.00
0.00
0.20
0.00
48.01
Romania6.txt
15.01
27.66
21.48
7.86
23.12
0.00
1.98
0.00
0.66
1.17
1.05
0.00
0.00
42.67
Romania8.txt
21.00
25.37
16.21
13.17
19.20
2.18
0.90
0.15
0.81
0.91
0.00
0.09
0.00
46.37
Romania9.txt
23.61
24.26
17.82
3.69
27.06
0.00
1.31
0.42
0.57
1.02
0.23
0.00
0.00
47.87
Romania10.txt
17.69
34.20
13.56
6.35
24.82
0.78
0.43
0.00
1.03
0.89
0.00
0.00
0.24
51.89
Romania11.txt
27.41
24.25
18.52
10.78
15.36
0.83
0.22
0.00
1.45
0.85
0.00
0.00
0.33
51.66
Romania12.txt
7.12
34.64
22.99
11.43
22.03
0.00
0.00
0.69
0.43
0.47
0.20
0.00
0.00
41.76
Romania13.txt
19.21
32.76
21.60
6.80
14.84
2.37
0.00
0.00
0.35
1.51
0.56
0.00
0.00
51.97
Romania14.txt
5.29
35.53
25.56
6.65
23.24
0.00
2.14
0.18
1.06
0.06
0.30
0.00
0.00
40.82
Romania15.txt
6.62
35.85
26.08
13.27
14.14
1.10
0.00
0.00
2.25
0.13
0.56
0.00
0.00
42.47
Romania16.txt
13.64
40.86
16.74
7.82
17.75
0.00
0.91
0.00
1.57
0.62
0.10
0.00
0.00
54.50
</tbody>
</google-sheets-html-origin>
2 full Romas/Gypsies, which look normal compared to the others that I have seen
<google-sheets-html-origin style="color: rgb(0, 0, 0); font-size: medium;">
<tbody>
Name
N_Atlantic
Baltic
West_Med
West_Asian
East_Med
Red_Sea
South_Asian
East_Asian
Siberian
Amerindian
Oceanian
NE_African
Sub-Saharan
Romania3_Roma.txt
14.91
11.19
11.66
15.92
16.75
3.86
22.30
1.08
0.98
0.00
0.72
0.64
0.00
Romania7_Roma.txt
3.94
8.70
12.96
16.67
23.85
2.36
27.17
2.57
0.00
0.39
1.40
0.00
0.00
</tbody>
</google-sheets-html-origin>
I have asked the data manager at the Estonian Biocentre to confirm the localities from where they were collected, since that is not mentioned in the study.
Yes, some of them are straight up impossible. By the way, the Roma are also weird, see the North Atlantic difference between them.
vbnetkhio
10-02-2020, 05:33 PM
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.
Behar was on Dodecad, I had it replaced with a more realistic average. Dodecad is full of such weird-ass averages.
I've seen such results before. even with 2% north atlantic
Ion Basescul
10-02-2020, 05:43 PM
that looks like the calculator effect, those 5 were probably used as reference samples for Eurogenes k13.
Does that even matter? Aren't they just compared against a single result, which is the average with like 24% NA and 24% Baltic?
WeirdLookingFellow
10-02-2020, 05:57 PM
Why are these studies not even using 30 samples? I don't know about Genetics 101 but Statistics 101 says that if you don't have 30 samples you might as well give up, you can't run more than a student t-test and expect the results to be worth anything.
I get that the Baltic + N Atlantic ends up being in the Basescu average more or less but this is ridiculous, at least for me.
vbnetkhio
10-02-2020, 06:04 PM
Does that even matter? Aren't they just compared against a single result, which is the average with like 24% NA and 24% Baltic?
i mean the reference samples for sourcing the components. The "Baltic", "North Atlantic" etc, are partially built from those 5 samples' dna.
when you run reference samples trough the calculator which was built from those very samples , they get weird results like these. that's the calculator effect.
i think 6, 10 and 16 are also probably reference samples.
Ion Basescul
10-02-2020, 06:19 PM
I've seen such results before. even with 2% north atlantic
2% is impossible even for full Romas. The problem is the difference between Baltic and North Atlantic, because the Baltic source also carries a lot of Atlantic. 35% Baltic and 5% NA is impossible to reproduce with modern populations.
2% is impossible even for full Romas. The problem is the difference between Baltic and North Atlantic, because the Baltic source also carries a lot of Atlantic.
it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.
Peterski
10-02-2020, 06:23 PM
Some of them are mixed with Gypsies IIRC.
Ion Basescul
10-02-2020, 06:24 PM
it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.
Probably some kind of damaged kits.
Ion Basescul
10-02-2020, 06:25 PM
Some of them are mixed with Gypsies IIRC.
It looks like none are. There are just 2 full Gypsies and 14 unmixed Romanians, out of whom some have very weird differences between North Atlantic and Baltic.
vbnetkhio
10-02-2020, 06:25 PM
it was a huge list with kit numbers. if you dig the forum you may find them. there were kits from all over the balkans. the ones with very very low north atlantic had very high baltic and east med.
those were academic samples uploaded by Ajeje Brazorf, including these Romanians. some others on the list also suffered from the calcualtor effect.
https://www.theapricity.com/forum/showthread.php?245352-Random-GEDmatch-kits
they are deleted from gedmatch now.
Ion Basescul
10-02-2020, 06:28 PM
those were academic samples uploaded by Ajeje Brazorf, including these Romanians. some others on the list also suffered from the calcualtor effect.
https://www.theapricity.com/forum/showthread.php?245352-Random-GEDmatch-kits
they are deleted from gedmatch now.
Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.
Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.
Maybe MDLP K16?
Peterski
10-02-2020, 06:36 PM
It looks like none are.
OK maybe those were Romanians from another study that were mixed with Gypsies.
Ion Basescul
10-02-2020, 06:42 PM
Maybe MDLP K16?
Yeah, vbnkethio is right. Those with weird results are being used as references for Eurogenes K13, resulting in a calculator effect.
Romania5 (looks like a person from Muntenia)
puntDNAL K13
# Population Percent
1 NE_Europe 36.66
2 SW_Europe 30.04
3 West_Asia 18.95
4 SW_Asia 9.93
5 Siberia 1.84
6 SE_Asia 0.89
7 East_Africa 0.8
8 Americas 0.32
9 West_Africa 0.29
10 South_Africa 0.21
11 Oceania 0.06
Single Population Sharing:
# Population (source) Distance
1 Romanian 4.58
2 Montenegrin 5.2
3 Serbian 6.27
4 Bulgarian 6.95
5 Macedonian 7.26
6 Bosnian 7.82
7 Moldavian 8.73
8 Croatian 10.4
9 Kosovar 11
10 Albanian 11.82
11 Greek_Thessaly 12.37
12 Slovene 12.43
13 Hungarian 12.43
14 Italian_Tuscan 13.49
15 Slovak 13.51
16 German_South 14.68
17 French 14.94
18 Italian_Bergamo 15.38
19 Belgian 15.61
20 Greek_Central 16.12
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 62.7% Greek_Central + 37.3% Russian @ 1.31
2 60.7% Greek_Central + 39.3% Mordovian @ 1.54
3 70.8% Greek_Central + 29.2% Finnish @ 1.61
4 56.2% Ashkenazy_Jew + 43.8% Russian @ 1.8
5 54.1% Ashkenazy_Jew + 45.9% Mordovian @ 1.85
6 73.2% Slovak + 26.8% Lebanese_Muslim @ 2.04
7 72.9% Slovak + 27.1% Lebanese_Druze @ 2.07
8 72.3% Slovak + 27.7% Syrian @ 2.14
9 68.8% Slovene + 31.2% Turkish @ 2.15
10 67% Slovak + 33% Turkish @ 2.2
11 54.6% Italian_Sicilian + 45.4% Russian @ 2.21
12 58.5% Belarusian + 41.5% Cypriot @ 2.22
13 71.1% Slovene + 28.9% Turkish_Kayseri @ 2.25
14 52.5% Italian_Sicilian + 47.5% Mordovian @ 2.26
15 73% Slovak + 27% Lebanese_Christian @ 2.3
16 67.5% German_North + 32.5% Lebanese_Muslim @ 2.31
17 61.3% Swedish + 38.7% Lebanese_Muslim @ 2.32
18 77.4% Bosnian + 22.6% Turkish_Aydin @ 2.34
19 69.3% Slovak + 30.7% Turkish_Kayseri @ 2.35
20 60.9% Swedish + 39.1% Lebanese_Druze @ 2.37
MDLP K16
# Population Percent
1 Caucasian 36.24
2 Neolithic 24.11
3 NorthEastEuropean 19.98
4 Steppe 14.11
5 NearEast 3.46
6 Siberian 1.01
7 SouthEastAsian 0.5
8 NorthAfrican 0.43
9 Oceanic 0.1
10 Amerindian 0.07
Single Population Sharing:
# Population (source) Distance
1 Bulgarian (Bulgaria) 3.92
2 Bulgarian (Bulgaria) 4.32
3 Gagauz (Gagauzia) 4.75
4 Macedonian (Macedonia) 5.45
5 Kosovar (Kosovo) 6.04
6 Greek (Thessaloniki) 6.43
7 Albanian (Albania) 6.48
8 Greek (Greece) 6.9
9 Romanian (Gorj) 7.07
10 Montenegrian (Montenegro) 7.12
11 Greek (Peloponnes) 7.2
12 Romanian (Romania) 7.21
13 Serbian (Serbia) 7.66
14 Romanian (Apuseni) 7.78
15 Greek (Macedonia) 7.8
16 Moldavian (Molodva) 8.66
17 Serbian (Bosnia-Herzegovina) 9.37
18 Italian (Friul) 9.52
19 Bosnian (Bosnia-Herzegovina) 9.9
20 Italian (Abruzzo) 10.03
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 52.5% Greek (Greece) + 47.5% Hungarian (Budapest) @ 1.94
2 66.6% Greek (Greece) + 33.4% Lithuanian (Lithuania) @ 2.13
3 63.1% Greek (Greece) + 36.9% Belarusian (Belarus) @ 2.16
4 67.9% Greek (Greece) + 32.1% Latvian (Latvia) @ 2.2
5 54.7% Greek (Greece) + 45.3% Hungarian (Hungary) @ 2.32
6 91.1% Bulgarian (Bulgaria) + 8.9% Jew (Georgia) @ 2.33
7 59.1% Greek (Greece) + 40.9% Ukrainian (Ukraine) @ 2.35
8 89.7% Bulgarian (Bulgaria) + 10.3% Druze (Mount_Carmel) @ 2.35
9 62.2% Greek (Greece) + 37.8% Pole (Poland) @ 2.36
10 68.9% Greek (Greece) + 31.1% Latvian_Dobele (Dobele) @ 2.38
11 91% Bulgarian (Bulgaria) + 9% Armenian (Armenia) @ 2.39
12 68.4% Greek (Greece) + 31.6% Estonian (Estonia) @ 2.39
13 62.4% Greek (Greece) + 37.6% Ukrainians_east (EastUkraine) @ 2.48
14 58.8% Greek (Greece) + 41.2% Ukrainians_west (WestUkraine) @ 2.49
15 91.7% Bulgarian (Bulgaria) + 8.3% Turk (Trabzon) @ 2.49
16 60.4% Greek (Greece) + 39.6% Belarusian_West (WestBelarus) @ 2.49
17 56.5% Serbian (Bosnia-Herzegovina) + 43.5% Greek (Greece) @ 2.52
18 67.1% Greek (Athens) + 32.9% Russians-West (WestRussian) @ 2.52
19 88.7% Bulgarian (Bulgaria) + 11.3% Cypriot (Cyprus) @ 2.53
20 82.2% Serbian (Serbia) + 17.8% Turk (Trabzon) @ 2.54
Compare that to Eurogenes K13 :lol:
# Population Percent
1 Baltic 40.44
2 East_Med 32.72
3 West_Med 12.73
4 North_Atlantic 7.6
5 West_Asian 3.66
6 Siberian 1.72
7 Red_Sea 0.93
8 Northeast_African 0.2
Single Population Sharing:
# Population (source) Distance
1 Bulgarian 23.51
2 Romanian 25.08
3 Moldavian 25.44
4 Serbian 25.87
5 Greek_Thessaly 26.6
6 Croatian 26.82
7 Southwest_Russian 28.72
8 Ashkenazi 29.18
9 Ukrainian 29.24
10 Ukrainian_Lviv 29.27
11 Ukrainian_Belgorod 29.43
12 Hungarian 30.03
13 South_Polish 30.4
14 Erzya 30.96
15 Central_Greek 31.07
16 Estonian_Polish 31.12
17 Russian_Smolensk 31.58
18 Belorussian 31.68
19 East_Sicilian 31.69
20 Polish 32.17
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 56% Lithuanian + 44% Lebanese_Christian @ 17.49
2 60.2% Erzya + 39.8% Lebanese_Christian @ 17.53
3 59.7% Erzya + 40.3% Samaritan @ 17.63
4 59.4% Erzya + 40.6% Lebanese_Druze @ 17.69
5 55.5% Lithuanian + 44.5% Samaritan @ 17.71
6 63% Southwest_Russian + 37% Lebanese_Christian @ 17.76
7 60.1% Estonian_Polish + 39.9% Lebanese_Christian @ 17.77
8 55.2% Lithuanian + 44.8% Lebanese_Druze @ 17.77
9 62.6% Southwest_Russian + 37.4% Samaritan @ 17.88
10 59.5% Belorussian + 40.5% Lebanese_Christian @ 17.95
11 59.3% Estonian_Polish + 40.7% Lebanese_Druze @ 17.96
12 59.6% Estonian_Polish + 40.4% Samaritan @ 17.99
13 62.3% Southwest_Russian + 37.7% Lebanese_Druze @ 18.01
14 64.2% Erzya + 35.8% Yemenite_Jewish @ 18.13
15 59.1% Belorussian + 40.9% Samaritan @ 18.14
16 66.9% Southwest_Russian + 33.1% Yemenite_Jewish @ 18.17
17 58.8% Belorussian + 41.2% Lebanese_Druze @ 18.25
18 57.3% Erzya + 42.7% Cyprian @ 18.34
19 62.4% Ukrainian_Belgorod + 37.6% Lebanese_Christian @ 18.43
20 62% Ukrainian_Belgorod + 38% Samaritan @ 18.5
Ion Basescul
10-02-2020, 06:43 PM
OK maybe those were Romanians from another study that were mixed with Gypsies.
It's these ones. There are only 2 datasets as far as I am aware. David Reich's from Gorj and Alba and these ones from Behar, where Behar is mixing full Gypsies with Romanians in the PCA and other analyses.
vbnetkhio
10-02-2020, 06:49 PM
Unsurprisingly, all of those seem to be from Behar and Yusunbaev at the Estonian Biocentre. So, I guess the best conclusion for now is that they produce such weird results because David used them as references.
Do you guys know of a calculator that didn't use them? We could try and see if that's the reason.
mdlp k11,it was built from ancient sampels only.
it doesn't have a proper oracle but you can compare to this one i started building:
British,0.00,1.00,0.00,0.00,1.67,33.37,1.22,26.00, 0.00,0.00,36.10
Ukrainian,0.01,0.52,0.14,2.28,1.46,26.17,0.23,26.4 6,0.37,1.17,41.14
Serb,0.01,0.30,0.08,6.66,2.54,34.54,0.18,25.20,0.3 2,0.51,29.66
Polish,0.02,0.62,0.07,2.93,1.52,27.53,0.09,27.14,0 .11,0.17,39.73
Hungarian,0.00,0.00,0.00,5.07,0.00,32.98,0.56,25.1 1,0.00,0.10,36.19
Greek_mainland,0.10,0.29,0.27,14.03,3.47,38.15,0.2 1,25.01,0.30,0.20,17.96
Montenegrin,0.00,0.41,0.21,7.55,3.03,36.73,0.06,24 .66,0.11,0.42,26.81
Moldavian,0.29,0.22,0.33,5.45,2.06,29.13,0.23,25.7 5,1.16,0.84,34.39
Early Slav,1.16,0.85,0.08,0.60,1.24,28.47,0.00,25.44,0.4 1,0.39,41.35
German_East,0.00,0.00,0.00,0.00,0.73,30.66,0.00,26 .97,0.00,0.79,40.86
Croat,0.00,0.32,0.10,6.33,1.66,33.56,0.34,25.47,0. 19,0.47,31.55
Albanian,0.15,0.21,0.12,12.22,3.38,41.07,0.33,24.4 5,0.29,0.32,17.47
Ion Basescul
10-02-2020, 06:55 PM
mdlp k11,it was built from ancient sampels only.
it doesn't have a proper oracle but you can compare to this one i started building:
British,0.00,1.00,0.00,0.00,1.67,33.37,1.22,26.00, 0.00,0.00,36.10
Ukrainian,0.01,0.52,0.14,2.28,1.46,26.17,0.23,26.4 6,0.37,1.17,41.14
Serb,0.01,0.30,0.08,6.66,2.54,34.54,0.18,25.20,0.3 2,0.51,29.66
Polish,0.02,0.62,0.07,2.93,1.52,27.53,0.09,27.14,0 .11,0.17,39.73
Hungarian,0.00,0.00,0.00,5.07,0.00,32.98,0.56,25.1 1,0.00,0.10,36.19
Greek_mainland,0.10,0.29,0.27,14.03,3.47,38.15,0.2 1,25.01,0.30,0.20,17.96
Montenegrin,0.00,0.41,0.21,7.55,3.03,36.73,0.06,24 .66,0.11,0.42,26.81
Moldavian,0.29,0.22,0.33,5.45,2.06,29.13,0.23,25.7 5,1.16,0.84,34.39
Early Slav,1.16,0.85,0.08,0.60,1.24,28.47,0.00,25.44,0.4 1,0.39,41.35
German_East,0.00,0.00,0.00,0.00,0.73,30.66,0.00,26 .97,0.00,0.79,40.86
Croat,0.00,0.32,0.10,6.33,1.66,33.56,0.34,25.47,0. 19,0.47,31.55
Albanian,0.15,0.21,0.12,12.22,3.38,41.07,0.33,24.4 5,0.29,0.32,17.47
Yep, I took a quick look with Romania4 and Romania5 who had issues and they work fine here.
<tbody>
Distance to:
Romania4
2.88749026
Montenegrin
5.03665564
Serb
7.00624008
Croat
9.79695871
Greek_mainland
10.55597461
Albanian
11.56684054
Moldavian
11.83931586
Hungarian
14.24914383
British
17.63675140
Polish
18.62794138
German_East
19.32760720
EarlySlav
19.47360778
Ukrainian
</tbody>
<tbody>
Target: Romania4
Distance: 1.9156% / 1.91564203 | ADC: 0.25x
47.2
Montenegrin
23.6
Greek_mainland
20.6
Serb
5.6
Moldavian
3.0
Croat
</tbody>
<tbody>
Distance to:
Romania5
7.33466427
Montenegrin
8.08181292
Greek_mainland
8.80423762
Serb
10.34410460
Croat
10.47838251
Albanian
13.61249059
Moldavian
14.99327516
Hungarian
18.08355054
British
19.78615930
Polish
21.54866817
Ukrainian
21.75220678
German_East
22.03236937
EarlySlav
</tbody>
<tbody>
Target: Romania5
Distance: 3.1267% / 3.12672721 | ADC: 0.25x
64.8
Greek_mainland
35.2
Moldavian
</tbody>
Dirdepo
10-02-2020, 07:52 PM
Vlach gonna break the calculator fam, that just how goes
Vlach gonna break the calculator fam, that just how goes
Cumansky, you've been identified and reported, just so you know.
Powered by vBulletin® Version 4.2.3 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.