Log in

View Full Version : Which is the most accurate cluster/genetic map out there?



Zroota
10-26-2019, 06:21 AM
This is from Eupedia, which I usually see around. Is it even accurate or is it too generic?

https://i.imgur.com/BIMJlpt.png

vbnetkhio
10-26-2019, 07:18 AM
This is from Eupedia, which I usually see around. Is it even accurate or is it too generic?

https://i.imgur.com/BIMJlpt.png

looks accurate. it looks like a PCA based on Eurogenes k15 or k36, which are pretty accurate

also looking at these clusters i'm pretty sure these academic samples were used to make this plot:
https://www.theapricity.com/forum/showthread.php?245352-Random-GEDmatch-kits
https://www.theapricity.com/forum/showthread.php?301242-Balkans-Central-Europe-Ukraine-academic-samples-on-gedmatch

this is a plot based on k15. not really a PCA, but very similar. maybe even better.
https://gen3553.pagesperso-orange.fr/ADN/K15.htm

here i made a plot based on k36
https://www.theapricity.com/forum/showthread.php?303248-k36-schematic-map-(-quot-PCA-quot-)

Calpurnius
10-26-2019, 01:59 PM
Sort of accurate for relatively pure west Eurasians, but the distance of maghrebins ends up being smaller than it is due to their African/Iberomaurusian ancestry which on a global PCA with African samples will produce more realistic results, there just isn't enough space in two dimensions for all that variation.
Here's mine with some ancients:
https://i.imgur.com/8gOTzT1.png

Zroota
10-26-2019, 02:14 PM
Sort of accurate for relatively pure west Eurasians, but the distance of maghrebins ends up being smaller than it is due to their African/Iberomaurusian ancestry which on a global PCA with African samples will produce more realistic results, there just isn't enough space in two dimensions for all that variation.
Here's mine with some ancients:
https://i.imgur.com/8gOTzT1.png
Nice.

How do you make these? Or, where do you find them?

vbnetkhio
10-26-2019, 02:19 PM
Nice.

How do you make these? Or, where do you find them?

to find them, google "autosomal PCA"

you can make them in the free program past https://folk.uio.no/ohammer/past/

Calpurnius
10-26-2019, 02:44 PM
Nice.

How do you make these? Or, where do you find them?
I've done these in R reprocessing G25 scaled coordinates. Basically once I have a subset of populations I'm interested with, I perform a pca on this subset using the function prcompr. To make the plot, I use the library ggplot2.
I can post the code for those interested but you have to be familiar with R and have some libraries installed(stats, ggplot2, dplyr, tidyr, ggrepel)

Pine
10-26-2019, 02:49 PM
Eupedia is garbage in general, but that particular PCA doesn't seem bad. It's clearly done using Past3.

Lemgrant
10-26-2019, 03:09 PM
3d pca is way better than 2d pca, because it has more information (tutorial here >> https://www.theapricity.com/forum/showthread.php?177271-Tutorial-for-autosomal-calculators&p=6169153&viewfull=1#post6169153). In 2d pca, two points can appear very close when in fact they are very distant. Also RStudio is a lot better than past3 program, and it has the library ggrepel which makes the plots readable with many samples.

Ibericus
10-26-2019, 10:08 PM
Sort of accurate for relatively pure west Eurasians, but the distance of maghrebins ends up being smaller than it is due to their African/Iberomaurusian ancestry which on a global PCA with African samples will produce more realistic results, there just isn't enough space in two dimensions for all that variation.

that's correct, in a west-eurasia PCA the SSA factor is not taken in account, hence the north-africans get much closer to other populataions than they really are, in reality they are 25% SSA

Zroota
10-27-2019, 05:29 AM
Ph2ter has pretty accurate cluster maps of Eurasia IMO:

https://www.theapricity.com/forum/showthread.php?304750-A-dolphin-plot-of-Eurasia

Lemgrant
10-27-2019, 09:51 AM
Ph2ter has pretty accurate cluster maps of Eurasia IMO:

https://www.theapricity.com/forum/showthread.php?304750-A-dolphin-plot-of-Eurasia

t-sne plots are great, but one needs to play with parameters in order to achieve result that makes sense. And how do you know which result makes sense? for that you need to look at 3d pca or 3d mds with euclidean distances.

read this: https://distill.pub/2016/misread-tsne/

ph2ter
10-27-2019, 10:34 AM
t-sne plots are great, but one needs to play with parameters in order to achieve result that makes sense. And how do you know which result makes sense? for that you need to look at 3d pca or 3d mds with euclidean distances.

read this: https://distill.pub/2016/misread-tsne/

Yes, you must understand what t-SNE does with the data.
Nice representation of the difference between t-SNE adn PCA are these two plots:
PCA:
https://2.bp.blogspot.com/-0UL3RblDglg/V0Ml3ZjbrzI/AAAAAAAABGI/nRi8sSxQ4o0jJOKWxGOTKU8uA_1Erlw_QCLcB/s640/pca.png

t_SNE with the same data tend to group the data into distinct clusters according to the dominant part of admixture (neglecting the minor part of admixture):
https://4.bp.blogspot.com/-2cHO8T4lAgg/V0MkvZqFQBI/AAAAAAAABF8/Vk4w8aQsibYiFVHZ8SkvLbf_erHrPpbTQCLcB/s640/tsne.png
Distances between the samples are not preserved, nor the arrangement of clusters means much except when you use high value of perplexity. In my last 'dolphine' plot I used very high perplexity which preserved also the relation between the clusters. It is very flexible algorithm.
And I think that my plots make sense. I've experimented with the data more than anyone here.

andre
10-27-2019, 11:19 PM
Bulgarians tend to be more north shifted than romanians? I don’t think so.

Zroota
10-31-2019, 11:15 AM
Bulgarians tend to be more north shifted than romanians? I don’t think so.
To be fair, you can see that Bulgarians have more southern shifted people than the Romanians (at the same time having more northern shifted ones). So it's sort of a "win-win" in your case.

vbnetkhio
10-31-2019, 12:19 PM
Bulgarians tend to be more north shifted than romanians? I don’t think so.

there is nothing wrong with the pca algorithm, the author just used south Romanians.
Romanians are very similar to Bulgarians as a whole, maybe just a little bit more north shifted on average

vbnetkhio
10-31-2019, 12:43 PM
Yes, you must understand what t-SNE does with the data.
Nice representation of the difference between t-SNE adn PCA are these two plots:
PCA:
https://2.bp.blogspot.com/-0UL3RblDglg/V0Ml3ZjbrzI/AAAAAAAABGI/nRi8sSxQ4o0jJOKWxGOTKU8uA_1Erlw_QCLcB/s640/pca.png

t_SNE with the same data tend to group the data into distinct clusters according to the dominant part of admixture (neglecting the minor part of admixture):
https://4.bp.blogspot.com/-2cHO8T4lAgg/V0MkvZqFQBI/AAAAAAAABF8/Vk4w8aQsibYiFVHZ8SkvLbf_erHrPpbTQCLcB/s640/tsne.png
Distances between the samples are not preserved, nor the arrangement of clusters means much except when you use high value of perplexity. In my last 'dolphine' plot I used very high perplexity which preserved also the relation between the clusters. It is very flexible algorithm.
And I think that my plots make sense. I've experimented with the data more than anyone here.

do you think Ward method clustering is good for dividing a bunch of k15/k36 etc. samples into regional clusters?

ph2ter
10-31-2019, 07:29 PM
do you think Ward method clustering is good for dividing a bunch of k15/k36 etc. samples into regional clusters?

I don't have any opinion about that because Past has error when using with more than 1000 or so samples and I didn't try the clustering in any of the other tools.

vbnetkhio
10-31-2019, 07:43 PM
I don't have any opinion about that because Past has error when using with more than 1000 or so samples and I didn't try the clustering in any of the other tools.

it can be done in R. i like it better than k-means because it isn't stochastic. but i'm not sure is it good for this purpose

Adamastor
11-05-2019, 07:38 PM
that's correct, in a west-eurasia PCA the SSA factor is not taken in account, hence the north-africans get much closer to other populataions than they really are, in reality they are 25% SSA

That would make the more North African admixed Iberians (who are around 12-13% NA) around 4% SSA. I don't think North Africans reach 25-30% of true SSA. They do have real SSA, but not that amount. A good portion of their SSA is Taforalt/Iberomaurusian related, not modern Yoruban/Bantu-like black African. Though they do have in between 10-15% ''true'' SSA (''black'') due to the slave trade.

Leto
11-05-2019, 07:45 PM
That would make the more North African admixed Iberians (who are around 12-13% NA) around 4% SSA. I don't think North Africans reach 25-30% of true SSA. They do have real SSA, but not that amount. A good portion of their SSA is Taforalt/Iberomaurusian related, not modern Yoruban/Bantu-like black African. Though they do have in between 10-15% ''true'' SSA (''black'') due to the slave trade.
They also prolly have a good amount "Red Sea" admixture which contributes to their overall darkness.

Adamastor
11-05-2019, 07:53 PM
They also prolly have a good amount "Red Sea" admixture which contributes to their overall darkness.

Yeah, Red_Sea and Southwest Asian components, despite being considered ''Caucasoid'' components, have a ton of early Basal Eurasian and these Basal groups had metrical affinities with Sub-Saharan Africans despite being genetically different.