Population distances on GEDmatch Oracle and Vahaduo [Archive] - The Apricity Forum: A European Cultural Community

reboun

05-02-2021, 02:42 PM

Isn't there an inconsistency when determining population distances when using GEDmatch Oracle or Vahaduo? Let us consider Eurogenes K13 and assume that there are 3 imaginary humans. First one scores 100% West Asian, second one score 100% East Med, third one scores 100% East Asian. According to the algorithm of GEDmatch Oracle and Vahaduo, their Euclidean distances will be equal to each other and therefore all these three humans would be genetically equidistant from each other. However, my intuition says that the first and the second person would be much closer to each other and the third person would be a genetic outlier. It is simply because West Asian and East Med components are closer to each other whereas East Asian component is further from the two. GEDmatch Oracle and Vahaduo regard the components in the calculators as different dimensions and therefore the Euclidean distances between the components are treated as equidistant from each other. Isn't there something wrong with this calculation?

Sandis

05-02-2021, 03:20 PM

Distances between clusters also should be taken into account.
I try to implement it in my algorithms. The more different clusters, the greater additional distance.

reboun

05-02-2021, 11:19 PM

Bump

reboun

05-03-2021, 12:24 PM

Bump

Petalpusher

05-03-2021, 12:52 PM

Oracle distances aren't as correlated to real relationship, as if if it was based on multi dimensional fst. Scoring lets say +/- 5% Altantic or Baltic for example won't be that different than scoring 5% of SSA or East Asian instead. While the relationship between two euros components is infinetly closer to each others than any other non west eurasian components. Even 5% should visibly pull someone outside of main Euro cluster at world scale.

This is also why for example mixed people often look non mixed on pca and even South Americans with Amerindian will often cluster visually with eastern Euro, because their non Euro admixture is not weighted and cannot be accurately represented in 2d as well. Amerindian is somewhere in the middle of Eurasia in 2d but in reality it's not an intermediate between Europeans and East Asian (they are not exactly that mix) it goes in another direction than this cline, somewhat parallel, if it was seen in 3 dimensions. Less problematic for full Euro but still create some skew on "forum pca". In other words the only accurate way to see genetic clustering is 3d and fst weighted.

https://vahaduo.github.io/3d/g25/

vbnetkhio

05-03-2021, 01:21 PM

Isn't there an inconsistency when determining population distances when using GEDmatch Oracle or Vahaduo? Let us consider Eurogenes K13 and assume that there are 3 imaginary humans. First one scores 100% West Asian, second one score 100% East Med, third one scores 100% East Asian. According to the algorithm of GEDmatch Oracle and Vahaduo, their Euclidean distances will be equal to each other and therefore all these three humans would be genetically equidistant from each other. However, my intuition says that the first and the second person would be much closer to each other and the third person would be a genetic outlier. It is simply because West Asian and East Med components are closer to each other whereas East Asian component is further from the two. GEDmatch Oracle and Vahaduo regard the components in the calculators as different dimensions and therefore the Euclidean distances between the components are treated as equidistant from each other. Isn't there something wrong with this calculation?

calculators are built to exaggurate differences between those closely related components like west asian and east med. that way we can get differentiate better between populations.
otherwise all west Eurasians would be much closer to each other.

for some calculators, fst distances between components were published, so you can check how closely components are related to each other.

Petalpusher

05-03-2021, 02:21 PM

Fst is very useful to check the affinity between components. It's not a distance per say as we think of it in Oracle but actually more important, it's how they are related to each others in the grand scheme of things. Lower value means increased relationship and similarity.

K15 https://docs.google.com/file/d/0B9o3EYTdM8lQS3VvTUYyYXd0akk/edit

Few examples:

1. Relationship to Sub-Saharan

2. Northeast_African @ 42
3. South_Asian @ 131
4. East_med @ 132
5. Red_Sea @ 139
6. West_Asian @ 139
7. Eastern @ 142
8. Atlantic @ 144
9. North_Sea @ 144
10. Baltic @ 148
11. West_Med @ 149
12. SE_Asian @ 164
13. Siberian @ 171
14. Amerindian @ 204
15. Oceanian @ 219

1.Relationship to Northeast_African

2. Subsaharan @ 42
3. East_med @ 94
4. South_Asian @ 102
5. West_Asian @ 103
6. Red_Sea @ 106
7. Eastern @ 108
8. Atlantic @ 108
9. North_Sea @ 109
10. West_Med @ 112
11. Baltic @ 114
12. SE_Asian @ 137
13. Siberian @ 144
14. Amerindian @ 178
15. Oceanian @ 195

1. Relationship to Oceanian

2. South_Asian @ 145
3. SE_Asian @ 166
4. Eastern @ 173
5. East_med @ 174
6. W_Asian @ 176
7. North_Sea @ 178
8. Siberian @ 178
9. Atlantic @ 179
10. Baltic @ 181
11. West_Med @ 188
12. Red_Sea @ 190
13. Northeast_African @ 195
14. Amerindian @ 217
15. Subsaharan @ 219

Ajeje Brazorf

06-29-2021, 02:25 PM

Fst is very useful to check the affinity between components. It's not a distance per say as we think of it in Oracle but actually more important, it's how they are related to each others in the grand scheme of things. Lower value means increased relationship and similarity.

K15 https://docs.google.com/file/d/0B9o3EYTdM8lQS3VvTUYyYXd0akk/edit

Few examples:

1. Relationship to Sub-Saharan

2. Northeast_African @ 42
3. South_Asian @ 131
4. East_med @ 132
5. Red_Sea @ 139
6. West_Asian @ 139
7. Eastern @ 142
8. Atlantic @ 144
9. North_Sea @ 144
10. Baltic @ 148
11. West_Med @ 149
12. SE_Asian @ 164
13. Siberian @ 171
14. Amerindian @ 204
15. Oceanian @ 219

1.Relationship to Northeast_African

2. Subsaharan @ 42
3. East_med @ 94
4. South_Asian @ 102
5. West_Asian @ 103
6. Red_Sea @ 106
7. Eastern @ 108
8. Atlantic @ 108
9. North_Sea @ 109
10. West_Med @ 112
11. Baltic @ 114
12. SE_Asian @ 137
13. Siberian @ 144
14. Amerindian @ 178
15. Oceanian @ 195

1. Relationship to Oceanian

2. South_Asian @ 145
3. SE_Asian @ 166
4. Eastern @ 173
5. East_med @ 174
6. W_Asian @ 176
7. North_Sea @ 178
8. Siberian @ 178
9. Atlantic @ 179
10. Baltic @ 181
11. West_Med @ 188
12. Red_Sea @ 190
13. Northeast_African @ 195
14. Amerindian @ 217
15. Subsaharan @ 219

Is there a way to create weighted distances with Eurogenes K15?

Petalpusher

06-29-2021, 02:51 PM

Is there a way to create weighted distances with Eurogenes K15?

I ve done a few years ago with K10, just an easy way for people to input their values and get a fst type of distance/relationship to something in particular, but not oracle distances per say

https://docs.google.com/spreadsheets/d/14raDu1IyRzrl22tj0N1kwQ2BiE4BSZytw4AWoeMm7mU/edit#gid=0

Lots of datas with some members from here and other forums like anthrogenica
https://docs.google.com/spreadsheets/d/14raDu1IyRzrl22tj0N1kwQ2BiE4BSZytw4AWoeMm7mU/edit#gid=1821373170

Zanzibar

06-29-2021, 03:02 PM

Oracle distances aren't as correlated to real relationship, as if if it was based on multi dimensional fst. Scoring lets say +/- 5% Altantic or Baltic for example won't be that different than scoring 5% of SSA or East Asian instead. While the relationship between two euros components is infinetly closer to each others than any other non west eurasian components. Even 5% should visibly pull someone outside of main Euro cluster at world scale.

This is also why for example mixed people often look non mixed on pca and even South Americans with Amerindian will often cluster visually with eastern Euro, because their non Euro admixture is not weighted and cannot be accurately represented in 2d as well. Amerindian is somewhere in the middle of Eurasia in 2d but in reality it's not an intermediate between Europeans and East Asian (they are not exactly that mix) it goes in another direction than this cline, somewhat parallel, if it was seen in 3 dimensions. Less problematic for full Euro but still create some skew on "forum pca". In other words the only accurate way to see genetic clustering is 3d and fst weighted.

https://vahaduo.github.io/3d/g25/

Would population distance in G25 be more accurate than those of Gedmatch?

Petalpusher

06-29-2021, 03:11 PM

Would population distance in G25 be more accurate than those of Gedmatch?

They are accurate for what they are, you could replace the components by wood, water and fire, it would be "accurate", it's just doesn't show you a real relationship, but within homogeneous groups it's fine in general because they tend to score the same more closely related stuff. I mean getting for example Slovakia @12.89 doesn't tell you much, in which directions of it, north, south, east because of ssa, east asian, amerindian, some euro centric components etc.. and same distance can be very different reasons between people. Multi pop fits gives already a better idea of the positioning.

I don't have G25 so i have no real opinion on it, im not sure you can get it currently (although davidski might still have my datas, i got his k7B a while ago)