Originally Posted by
ScandinavianCelt
Because of their plots on the PCA and where they represent as a result:
I have this issue with a lot of your calculators; you mix time periods a lot. The issue with you using the Vahaduo West Eurasia PCA is that the Principal Components of that PCA are scaled for West Eurasia. What does that mean? It's the same reason why West Africa is sitting right next to South Africa on your image, despite the distance being:
Distance to: West_Africa
0.39236311 South_Africa
For context, the difference between your East Europe and North Europe sources, a similar "visual" distance apart (actually, even moreso visually apart):
Distance to: North_Europe
0.04682976 East_Europe
If you're making a global regions calculator and want to use a PCA, fine, but pick one that actually represents the dataset. You also have this issue with CHG appearing
Anyway, the issue I have with you including WHG and CHG is one of shared genetic flow. No one today is a WHG or a CHG; those are now "components" of modern ancestries. By including them, you are not only messing up people's calculations but also have a weird discontinuity.
This is not exactly how it works, but more on a conceptual level. Let's say you are, if we ran you through a neolithic model, 1/2 CHG, 3/8 ANF and 1/8 Zagros. I will just use these samples to simulate "you", with your CHG source:
Code:
IRN_Ganj_Dareh_N,0.0430252,0.0664158,-0.1550722,0.0047158,-0.122669,0.0235384,0.017109,-0.0011998,-0.082546,-0.0544158,-0.0028258,-0.0016186,0.0044896,-0.0062756,0.0316498,0.0561384,-0.0054242,0.0068664,0.0136508,-0.0334162,0.00856,-0.028836,-0.0110678,-0.039331,0.0222254
CHG,0.091058,0.102568,-0.083344,-0.00323,-0.08617,0.020638,0.024911,-0.001846,-0.128236,-0.074717,-0.006333,0.023979,-0.054856,0.004404,0.026601,-0.03275,0.02386,-0.013429,-0.022249,0.034767,0.033815,-0.007048,0.006532,-0.025787,-0.002036
Anatolia_Barcin_N,0.1175998,0.180118,0.0035312,-0.101158,0.0510443,-0.0483875,-0.0043582,-0.0069334,0.0362287,0.0807473,0.0079718,0.0118803,-0.0234545,0.0004691,-0.0419807,-0.0101913,0.0233091,0.0019866,0.0136954,-0.0097489,-0.0142249,0.0057723,-0.0041232,-0.0031658,-0.0043437
Code:
Simulated_person,0.0950071,0.1271302,-0.0597318,-0.0389598,-0.039277,-0.004884,0.0129598,-0.003673,-0.0608505,-0.0138802,-0.0005303,0.0162423,-0.0356622,0.0015935,0.001514,-0.0131794,0.0199929,-0.0051112,-0.0042824,0.0095506,0.0126432,-0.0049639,0.0003363,-0.018997,0.0001313
Distance to: Simulated_person
0.03163327 Georgian_West
0.03269014 Georgian_Imer
0.03304694 Georgian_Lechkhumi
0.03318913 Georgian_Megr
0.03418862 Georgian_Ajar
0.03577044 Abkhasian
0.03580472 Georgian_Kakh
0.03590209 Georgian_Guria
0.03628262 Georgian_Svaneti
0.03647775 Abkhasian_Gudauta
0.03669752 Georgian_Ratcha
0.03695650 Georgian_Javakheti
0.03973724 Ahiska
0.03992211 Georgian_Kart
0.04143626 Georgian_Samtckhe
0.04159063 Georgian_Laz
0.04296016 Georgian_Mtiuleti
0.04539320 Andian_A
0.04613416 Georgian_NorthEast
0.04745903 Georgian_Tush
0.04836378 Georgian_Khevs
0.04909604 Armenian_Hemsheni
0.05218545 Ossetian
0.05342393 Turkish_Erzurum
0.05423057 Udi
So basically, "you" are Georgian. Your best distance with this sheet, because there is no Caucasian reference, is bad.
Code:
Distance to: Simulated_person
0.08908903 SW_Asia
0.11693770 Middle_East
0.12735398 Mediterranean_Islands
0.12945364 CHG
0.15843371 South_Europe
0.19144151 Central_Europe
0.20774026 North_Europe
0.21097567 East_Europe
0.22316297 North_Africa
0.24327719 Central_Asia
But when you run Single, you get a result like this:
Code:
Target: Simulated_person
Distance: 3.1609% / 0.03160923
47.6 CHG
26.6 South_Europe
25.2 Middle_East
0.6 Mediterranean_Islands
"Wow," you think, "That fit is not bad." But what is this even saying about you? It's not saying anything. You're not CHG + South Europe + Middle East + Mediterranean Islands; you're 1/2 CHG + 3/8 ANF + 1/8 Zagros, or in modern terms normal Caucasian, or even more specifically Georgian. The CHG is already built-in to a Caucasian reference; if you include it, the calculator will be actually saying something, and also you'll be preventing overfitting issues because you are minimizing overlap in your sources.
It's similar for WHG. Just because it makes the fit better, it doesn't mean anything if it is actively making the output of the calculator less meaningful. When is including WHG going to do anything except make the results confusing? We can take a look at a steppe-heavy source like Estonian for this:
Code:
Estonian,0.1329454,0.1141455,0.087756,0.0833017,0.0427771,0.0297575,0.0113745,0.0144457,0.0004091,-0.028447,6.5e-05,-0.0123489,0.0194597,0.0213591,-0.0066639,0.0017104,0.0002216,-0.0024832,0.0013451,0.0022137,0.001385,-0.0018672,0.0042644,0.0007471,0.0030535
Distance to: Estonian
0.03238333 East_Europe
0.05562343 North_Europe
0.06114073 Central_Europe
0.16606226 South_Europe
0.18535247 Mediterranean_Islands
0.19009880 SW_Asia
0.24757386 Middle_East
0.26053162 Central_Asia
0.28400543 North_Africa
0.29677406 CHG
0.30689766 North_India
0.30819570 WHG
0.35223273 America_Arctic
0.35357397 Siberia
0.40419210 South_Asian
0.54116037 SE_Asia
0.55999437 NE_Asia
0.57830066 America_North
0.58914208 Oceania
0.60245303 East_Asian
0.61344885 East_Africa
0.68072865 America_Central
0.70205055 America_South
0.77933863 West_Africa
0.84025046 South_Africa
When we run it on your calculator as it is right now, with ADC:0.25, we get the following:
Code:
Target: Estonian
Distance: 2.7514% / 0.02751353 | ADC: 0.25x RC
94.8 East_Europe
5.2 WHG
But still, Estonians are not... East_European + WHG. Most East Europeans already has some WHG, and even so the thing that defines Estonian genetic composition is actually a large amount of EHG, not WHG. That has no meaning. Does it mathematically work better? Yes, but the point of calculators is not to maximize a fit. If we remove the WHG source, we get:
Code:
Target: Estonian
Distance: 3.2345% / 0.03234489 | ADC: 0.25x RC
96.6 East_Europe
3.4 North_Europe
(For the sake of brevity I will not include it here, but whilst there is still a tiny bit of noise when running ADC:NO, removing WHG decreased it by a bit.)
Now we have a modern source + a modern source, and it makes more sense.
Bookmarks