PDA

View Full Version : South Slav members PCA



Jana
10-10-2021, 06:41 PM
https://i.imgur.com/y1BEMbN.png

rothaer
10-10-2021, 07:17 PM
https://i.imgur.com/y1BEMbN.png

Strange that Slovenes are fully within the range of Croats. Representative or a result of low testee numbers?

Jana
10-10-2021, 07:20 PM
Strange that Slovenes are fully within the range of Croats. Representative or a result of low testee numbers?

Not that representative, only 3 samples and 2 of them are half Slovenes (half other south Slavs) and third one is 3/4 Slovenian.

FinalFlash
10-10-2021, 07:21 PM
Bosnians overlapping with both Serbs and Croatians makes perfect sense. The Slovenian cluster is a bit strange though.

Jana
10-10-2021, 07:24 PM
I find it unexpected than those select few Montenegrin members we have plot more eastern than Macedonians do.

TheMaestro
10-10-2021, 07:48 PM
I find it unexpected than those select few Montenegrin members we have plot more eastern than Macedonians do.

Where would I plot? You know by any chance.

Jana
10-10-2021, 07:58 PM
Where would I plot? You know by any chance.

Somewhere in Serb cluster I'm pretty sure. If Romanians were included maybe there even better.

Peterski
10-10-2021, 08:56 PM
Can you add also Hungarian members to this PCA ???

I suspect they will overlap with Slovenes and Croats.

Komintasavalta
10-10-2021, 09:37 PM
Here's a PCA made the regular way where I didn't account for FST. The colored groups are based on cutting a hierarchical clustering tree at the height where it has 16 subtrees. Each user is connected with a line to their two closest neighbors:

https://i.ibb.co/qpGm1tg/1.png

When I multiplied the matrix of admixture weights with an MDS matrix of the FST matrix, then Morti, a part-Tatar user, and Milenko_uncle_in_law plotted far from other samples on PC2. Therefore I also made a second PCA with the three outliers removed:

https://i.ibb.co/mC3vGKs/4.png
https://i.ibb.co/ZgDfVtn/fst1.png

I used Mantel's test to show that when the matrix of admixture weights is multiplied by an MDS matrix of the FST matrix, it greatly improves correlation with f2 distance: https://anthrogenica.com/showthread.php?22402-Mantel-s-Test-G25-vs-genetic-distances. However it may not be very useful when you compare samples from related European populations, because it also amplifies differences in noise-level admixture of non-European components. (For example in the two plots above that were multiplied by MDS of FST, vlrs-2_cousin(Bulgarian) is a lone outlier in their own cluster, because they have 1.6% SSA and 1.9% Amerindian.)

Here's a heatmap of all the users. The branches are ordered based on the combined percentage of Baltic and North_Atlantic:

https://i.ibb.co/dLyrc68/heat.png

Jana
10-10-2021, 11:11 PM
Thank you Komintasavalta, amazing work.

bained
10-11-2021, 02:41 AM
Wait a second, I'm macedonian? How many layers of irony is this.

vbnetkhio
10-11-2021, 07:51 AM
https://i.imgur.com/y1BEMbN.png

did you use Milenko Uncle in law's original results?

https://www.theapricity.com/forum/showthread.php?352573-Wifys-uncle-MyHeritage-and-Autosomal&p=7305149&viewfull=1#post7305149

Ion Basescul
10-11-2021, 09:01 AM
You could add Romanian users for curiosity's sake, since we are in this range.

Moldova


IonBasescul_brother,20.94,32.81,16.13,11.69,12.21, 0.97,2.06,1.03,1.08,0.54,0.54,0,0
IonBasescul_mom_1/2Moldovan_1/2Ukrainian,19.44,37.30,15.60,10.04,13.19,0.47,0.95 ,1.49,0.28,0.41,0.82,0,0
IonBasescul_dad,24.14,32.44,14.55,11.75,12.15,0,1. 62,0,1.54,0.96,0,0.33,0.51
IonBasescul,25.68,29.49,15.21,9.61,12.77,1.72,0.62 ,1.82,2.81,0,0.27,0,0
Zmey_Gorynych,22.18,33.94,14.38,9.99,13.86,3.05,0, 1.21,0,0.91,0.49,0,0
superjelly40,25.90,28.78,18.11,10.36,9.02,2.97,1.8 1,0,1.34,0.67,0,1.04,0
Art23_mom_1/2Moldovan_1/2Ukrainian,21.88,39.32,12.60,8.80,12.61,0.88,0.39, 0.38,0.99,0.82,1.34,0,0
Daos_dad_1/2Moldovan_1/2Polish,22.11,39.39,12.60,7.73,8.94,2.67,3.34,0.58 ,0.76,1.18,0.69,0,0


Romania


WeirdLookingFellow,28.06,24.76,17.53,6.52,18.76,0, 0.1,0,1.37,1.36,0.22,0,1.32
WeirdLookingFellow_GF,23.23,26.67,18.62,9.95,18.12 ,0,0,0.36,1.41,0.91,0.45,0.18,0.10
andre,23.04,29.67,14.32,8,19.96,3.18,0,0.19,1.18,0 ,0.29,0,0.17
alexmegas777,30.11,20.59,20.39,10.51,13.94,2.83,0. 03,0,1.49,0.03,0.08,0,0
chrisbab,23.9,23.89,13.3,9.33,23.57,3.03,0,0,0.57, 1.89,0.52,0,0
chrisbab_dad,19.74,26.82,16.59,11.12,21.74,0.37,0, 1.04,0,1.30,0.81,0,0.47
chrisbab_mom,23.17,28.91,12.97,5.07,22.02,3.80,0,0 ,3.89,0,0.09,0,0.08
wd1089,24.33,28.74,14.79,10.77,14.11,2.68,1.09,1.8 5,1.23,0.2,0.21,0,0
Cybele,26.34,22.26,19.68,10.29,14.43,2.8,2.11,0,1. 52,0.3,0.28,0,0
Cybele_mom,25.50,23.15,17.63,10.05,17.66,2.79,0.10 ,0.55,2.01,0.27,0.30,0,0
Cybele_dad,25.25,24.30,18.85,9.80,14.83,2.78,2.41, 0,0.18,0.68,0.58,0.34,0
Fieraru,24.91,26.83,19.79,9.17,17.14,0.87,0,0.39,0 .48,0,0.22,0.2,0
Carpatz,22.59,28.44,19.8,7.19,15.77,3.34,0.34,1.38 ,1.16,0,0,0,0
Catalin_Tulcea,20.65,26.7,17.19,10.49,19.25,2.39,0 .44,0.37,0.84,1.01,0.08,0.59,0
Gobius,22.13,26.63,20.09,7.41,18.41,1.76,0.75,0,1. 98,0.85,0,0,0
ovidiu,22.91,23.26,17.38,11.35,21.49,1.52,0,0.64,0 .73,0,0.4,0,0.32
Voidspawn,22.61,32.05,14.50,7.73,16.11,2.23,0.96,0 .44,3.32,0,0,0,0
Voidspawn_dad,25.22,31.97,13.48,8.04,13.99,2.69,1. 15,0.35,1.91,1.18,0,0,0
Voidspawn_mom,22.05,34.89,12.67,6.66,17.94,0.62,0. 71,0,3.83,0,0,0,0.60
Incelslayer,26.60,35.89,17.78,7.14,10.15,0,0,1.10, 1.26,0,0,0,0.08
Impaler_cousin,24.66,25.16,16.95,8.76,20.01,1.15,1 .85,0.62,0.85,0,0,0,0
Imirvlad(Anthrogenica),29.33,30.15,16.62,6.80,12.6 2,0,1.15,0,1.42,0.27,0.52,1.13,0
Nurzat_1/2Romanian_1/2Ukrainian,25.10,30.93,16.34,10.95,11.35,0.08,1.20 ,1.01,1.30,0.74,0.64,0,0.36
Kökény_Szekler,29.96,23.72,14.70,9.34,13.97,1.39,1 .17,1.69,2.65,0.34,1.07,0,0
Kökény_mom_Szekler,27.79,28.45,15.04,8.08,13.96,1. 72,0.37,0.19,3.11,0.66,0.55,0,0.09
Kökény_dad_Szekler,29.81,26.19,16.10,10.47,11.20,0 ,0.03,3.44,2.15,0,0.01,0,0.61

Crn Volk
10-11-2021, 09:01 AM
Wait a second, I'm macedonian? How many layers of irony is this.

You have ancestry from Macedonia as I recall, no?

Komintasavalta
10-11-2021, 09:48 AM
Here's another plot that includes Hungarians, Romanians, Moldovans, and Albanians:

https://i.ibb.co/z5L1PHH/1.png

Kökény and Kökény_father are outliers because they are Szekély.

I again multiplied the matrix of admixture percentages with an MDS matrix of the FST matrix, which basically turns K13 into a PCA. MDS is essentially a version of PCA which takes a distance matrix as an input, so I used MDS to calculate coordinates for the 13 components of K13 in 11-dimensional space, and I then used their linear combination to derive the coordinates of each sample. And then I did a PCA based on those coordinates.

Ion Basescul
10-11-2021, 10:12 AM
Here's another plot that includes Hungarians, Romanians, Moldovans, and Albanians:

https://i.ibb.co/z5L1PHH/1.png

Kökény and Kökény_father are outliers because they are Szekély.

I again multiplied the matrix of admixture percentages with an MDS matrix of the FST matrix, which basically turns K13 into a PCA. MDS is essentially a version of PCA which takes a distance matrix as an input, so I used MDS to calculate coordinates for the 13 components of K13 in 11-dimensional space, and I then used their linear combination to derive the coordinates of each sample. And then I did a PCA based on those coordinates.

As expected, I plot in Kokeny's family as a full blooded Szekler. No wonder I lived for one year in Budapest and felt at home.

MandM
10-11-2021, 10:28 AM
Dont no witch coord you are using for me but this are te latest i use
Milenko,21.26,24.72,19.35,10.85,16.66,2.26,0.56,0. 71,1.06,0.99,1.03,0.34,0.22

Komintasavalta
10-11-2021, 10:58 AM
As expected, I plot in Kokeny's family as a full blooded Szekler. No wonder I lived for one year in Budapest and felt at home.

Maybe it's just noise, but you have one of the highest combined percentages of the East_Asian, Siberian, Amerindian, and Oceanian components:

https://i.ibb.co/7VzgvBy/1.png

Ion Basescul
10-11-2021, 11:02 AM
Dont no witch coord you are using for me but this are te latest i use
Milenko,21.26,24.72,19.35,10.85,16.66,2.26,0.56,0. 71,1.06,0.99,1.03,0.34,0.22

These are definitely wrong, since they are too different from the original.
I think people shouldn't really impute, because it doesn't always add value, since it tries to predict the values for the missing SNPs from the existing data. And as my own testing proved, the difference between the test with the lowest count of SNPs (55k with LivingDNA) and the highest count (AncestryDNA with 172k) proved to be minimal. The biggest difference was against MyHeritage, but for each component, the values didn't oscillate for more than 1-2% in either direction. So whatever company you test with, the data won't be as different from a test with full SNP count for Eurogenes or Dodecad calculators.

The above imputed one is certainly wrong, since for some components it oscillates with 5%.

Ion Basescul
10-11-2021, 11:03 AM
Maybe it's just noise, but you have one of the highest combined percentages of the East_Asian, Siberian, Amerindian, and Oceanian components:

https://i.ibb.co/7VzgvBy/1.png

I have 0.27% Oceanian, which is definitely lower than most on that chart. Also 0% Amerindian, but for the sum of East Asian (1.83%) and Siberian (2.85%), I fall at the upper end of Romanians/Moldovans and comfortably into Csango/Szekler range.

Kaspias
10-11-2021, 11:54 AM
Maybe it's just noise, but you have one of the highest combined percentages of the East_Asian, Siberian, Amerindian, and Oceanian components:

https://i.ibb.co/7VzgvBy/1.png

You can also add Balkan Turks:


Kaspias(balkan_turk/pomak),15.55,20.37,12.29,14.91,21.00,3.25,2.06,1.1 6,7.57,0.70,0.53,0,0.61
Kaspias_dad(balkan_turk),16.93,13.93,13.52,15.43,1 9.90,2.80,4.65,1.01,9.90,0.58,0.40,0.39,0.56
Kaspias_mom_phased(pomak),13.81,28.19,13.21,14.30, 24.01,2.99,0,0,3.49,0,0,0,0
Kaspias_cousin(balkan_turk),17.97,15.09,16.47,14.5 1,19.29,4.72,3.36,1.44,6.13,0.75,0,0.25,0
Kayra(balkan_turk/bosniak),25.96,18.75,13.86,12.63,15.78,2.88,1.32,3 .22,5.37,0,0.23,0,0
Deniz_mother(balkan_turk),20.45,22.68,13.98,14.7,1 6.89,3.43,0.77,2.68,2.91,0.51,0,1,0
Deniz_father(balkan_turk),14.22,25.08,14.69,16.55, 20.91,1.36,0,0,5.15,0,0.83,0,1.2
Deniz_grandpa(balkan_turk),21.22,22.84,14.09,12.95 ,19.33,1.04,2.25,2.56,2.24,0.25,0.13,0,1.09
Deniz(balkan_turk),17.96,24,13.09,15.85,20.15,2.49 ,0.51,0,4.66,0,0,1.29,0
Karakartal(balkan_turk),15.53,15.92,15.36,17.97,23 .45,4.04,1.20,0.82,4.08,0.44,1.20,0,0
Thracian(balkan_turk),19.94,16.83,13.89,16.36,21.6 8,3.87,3.67,0,3.02,0.39,0.34,0,0
migrec(balkan_turk),14.85,21.06,18.04,21.16,15.96, 1.45,0.74,2.15,3.97,0,0.62,0,0
Meric(balkan_turk),13.43,18.24,10.68,16.99,21.38,1 .88,1.88,3.00,10.03,1.41,0,0,1.09
Altay(balkan_turk),15.53,15.42,14.46,15.71,22.48,2 .63,2.89,1.18,7.69,0.54,0.22,0,1.25

Proto-Shaman
10-11-2021, 01:06 PM
try github.io

MandM
10-11-2021, 01:07 PM
These are definitely wrong, since they are too different from the original.
I think people shouldn't really impute, because it doesn't always add value, since it tries to predict the values for the missing SNPs from the existing data. And as my own testing proved, the difference between the test with the lowest count of SNPs (55k with LivingDNA) and the highest count (AncestryDNA with 172k) proved to be minimal. The biggest difference was against MyHeritage, but for each component, the values didn't oscillate for more than 1-2% in either direction. So whatever company you test with, the data won't be as different from a test with full SNP count for Eurogenes or Dodecad calculators.

The above imputed one is certainly wrong, since for some components it oscillates with 5%.

I belive the one she uses are wrong, that one was input and gave me some exotic i dont realy have, this one a gave i belive is ok ots my ftdna, se3ms i have so many and no order witch one is witch

Ion Basescul
10-11-2021, 01:16 PM
I belive the one she uses are wrong, that one was input and gave me some exotic i dont realy have, this one a gave i belive is ok ots my ftdna, se3ms i have so many and no order witch one is witch

Oh I see, I thought the below was the original

20.84,29.18,19.78,9.81,14.96,2.36,1.31,0,0.42,0.83 ,0.51,0,0

Tell Stearsolina to update your values in the K13 list, because everyone thinks that the above are the original.

MandM
10-11-2021, 02:07 PM
Oh I see, I thought the below was the original

20.84,29.18,19.78,9.81,14.96,2.36,1.31,0,0.42,0.83 ,0.51,0,0

Tell Stearsolina to update your values in the K13 list, because everyone thinks that the above are the original.
No worrys! Its not like you could have known, i will tell here to put in the latest one

knez01
10-11-2021, 02:57 PM
These are definitely wrong, since they are too different from the original.
I think people shouldn't really impute, because it doesn't always add value, since it tries to predict the values for the missing SNPs from the existing data. And as my own testing proved, the difference between the test with the lowest count of SNPs (55k with LivingDNA) and the highest count (AncestryDNA with 172k) proved to be minimal. The biggest difference was against MyHeritage, but for each component, the values didn't oscillate for more than 1-2% in either direction. So whatever company you test with, the data won't be as different from a test with full SNP count for Eurogenes or Dodecad calculators.

The above imputed one is certainly wrong, since for some components it oscillates with 5%.
The reason his imputation is not as accurate is because he used the DNAgenics imputation service, not the DNA.land.

Ion Basescul
10-11-2021, 03:34 PM
The reason his imputation is not as accurate is because he used the DNAgenics imputation service, not the DNA.land.

Doesn't matter, it's still a prediction. The original is the best, until someone tests with a test that actually looks at more SNPs, even if the imputed one might make more sense or put you closer to your native population. It can and likely is a false positive. I'd advise you to also stick to the original raw data.

knez01
10-11-2021, 07:02 PM
Doesn't matter, it's still a prediction. The original is the best, until someone tests with a test that actually looks at more SNPs, even if the imputed one might make more sense or put you closer to your native population. It can and likely is a false positive. I'd advise you to also stick to the original raw data.

I see where u are coming from but i disagree, I have a lot of samples of high snp original data and imputed low snp data and they are identical, imputation is overall mathematically accurate with little noise which is easily detectable. In my experience i only use imputed v5 after careful examination of it. Each to his own i guess!

TheMaestro
10-11-2021, 07:45 PM
Maybe it's just noise, but you have one of the highest combined percentages of the East_Asian, Siberian, Amerindian, and Oceanian components:

https://i.ibb.co/7VzgvBy/1.png

So I really plot with Romanians, my magyar grandpa rn.

https://i.imgur.com/5GEWKFq.gif

CommonSense
10-11-2021, 07:50 PM
Terminator isn't 1/4 Croatian, he's 1/8 Jewish (almost certainly) and the rest of his ancestry is Serbian. Funnily enough, without him our cluster would be smaller by 1/5 :D

CommonSense
10-11-2021, 07:55 PM
Bosnians overlapping with both Serbs and Croatians makes perfect sense. The Slovenian cluster is a bit strange though.

The Bosniak cluster is in reality as big as the entirety of this PCA, since there are those who identify as Bosniak but are genetically like Macedonians and Albos and those who are like northern Croats.

Benyzero
10-11-2021, 08:01 PM
Barely found myself.

Gergő Marosvári
10-11-2021, 08:09 PM
Maybe it's just noise, but you have one of the highest combined percentages of the East_Asian, Siberian, Amerindian, and Oceanian components:

https://i.ibb.co/7VzgvBy/1.png

I am so outlier here.
And surprisingly my grandma has the most North Atlantic+Baltic!!!

Gergő Marosvári
10-11-2021, 08:12 PM
Barely found myself.

Kiemelném hogy pont anyám mellett vagy... :D

Komintasavalta
10-11-2021, 11:30 PM
You can also add Balkan Turks:


Kaspias(balkan_turk/pomak),15.55,20.37,12.29,14.91,21.00,3.25,2.06,1.1 6,7.57,0.70,0.53,0,0.61
Kaspias_dad(balkan_turk),16.93,13.93,13.52,15.43,1 9.90,2.80,4.65,1.01,9.90,0.58,0.40,0.39,0.56
Kaspias_mom_phased(pomak),13.81,28.19,13.21,14.30, 24.01,2.99,0,0,3.49,0,0,0,0
Kaspias_cousin(balkan_turk),17.97,15.09,16.47,14.5 1,19.29,4.72,3.36,1.44,6.13,0.75,0,0.25,0
Kayra(balkan_turk/bosniak),25.96,18.75,13.86,12.63,15.78,2.88,1.32,3 .22,5.37,0,0.23,0,0
Deniz_mother(balkan_turk),20.45,22.68,13.98,14.7,1 6.89,3.43,0.77,2.68,2.91,0.51,0,1,0
Deniz_father(balkan_turk),14.22,25.08,14.69,16.55, 20.91,1.36,0,0,5.15,0,0.83,0,1.2
Deniz_grandpa(balkan_turk),21.22,22.84,14.09,12.95 ,19.33,1.04,2.25,2.56,2.24,0.25,0.13,0,1.09
Deniz(balkan_turk),17.96,24,13.09,15.85,20.15,2.49 ,0.51,0,4.66,0,0,1.29,0
Karakartal(balkan_turk),15.53,15.92,15.36,17.97,23 .45,4.04,1.20,0.82,4.08,0.44,1.20,0,0
Thracian(balkan_turk),19.94,16.83,13.89,16.36,21.6 8,3.87,3.67,0,3.02,0.39,0.34,0,0
migrec(balkan_turk),14.85,21.06,18.04,21.16,15.96, 1.45,0.74,2.15,3.97,0,0.62,0,0
Meric(balkan_turk),13.43,18.24,10.68,16.99,21.38,1 .88,1.88,3.00,10.03,1.41,0,0,1.09
Altay(balkan_turk),15.53,15.42,14.46,15.71,22.48,2 .63,2.89,1.18,7.69,0.54,0.22,0,1.25


Here's another plot that includes all users who have a 4% or more of the East_Asian, Siberian, and Amerindian components added together, and who have 30% or more of the North_Atlantic and Baltic components added together.

The clustering is based on a matrix where the admixture weights have been multiplied by MDS of FST, so it gives a lot of weight to differences in the level of SSA ancestry, and there are three Brazilian triracial users who have their own cluster.

I still don't understand how Elnar (Tatar-Mordvin) can be more Mongoloid than the Chuvash average.

https://i.ibb.co/j82fZ6W/1.png

Ion Basescul
10-12-2021, 12:04 AM
Here's another plot that includes all users who have a 4% or more of the East_Asian, Siberian, and Amerindian components added together, and who have 30% or more of the North_Atlantic and Baltic components added together.





https://www.youtube.com/watch?v=Y4ket21Tg6w

Kaspias
10-12-2021, 12:47 PM
Here's another plot that includes all users who have a 4% or more of the East_Asian, Siberian, and Amerindian components added together, and who have 30% or more of the North_Atlantic and Baltic components added together.

The clustering is based on a matrix where the admixture weights have been multiplied by MDS of FST, so it gives a lot of weight to differences in the level of SSA ancestry, and there are three Brazilian triracial users who have their own cluster.

I still don't understand how Elnar (Tatar-Mordvin) can be more Mongoloid than the Chuvash average.

https://i.ibb.co/j82fZ6W/1.png

Amerindian in Eurogenes K13 looks like to be not related to Eurasian groups despite it successfully represent actual Amerindians. Therefore, you may get a better view of Eurasia, or let's say Balkans as it's the subject of the thread if you remove it from the projection. East Asia + Siberia will be good enough.

I do not know if one could get a meaningful projection with it but I can give you also some other hint: Try to plot Balkan populations according to the distribution of East Asian and Siberian. Siberian pops up more frequently for some reason, especially in South Slavs, while East Asian has more spots on Romania and Hungary.

Both are present in Balkan Turks, but Siberian is decisively dominant in most cases, that's something interesting as Anatolian Turks also have got balanced East Asian / Siberian distributions.

PAGANE
10-12-2021, 02:17 PM
Would you add my cousins and uncles
Pagane uncle-Ros, 22.08,20.21,21.55,12.60,17.66,1.92,0.41,1.29,0.66, 1.52,0.11,0.00,0.00
Pagane uncle-Pan, 21.38,19.84,18.18,14.00,19.79,3.82,0.73,0.10,2.09, 0.00,0.07,0.00,0.00
Pagane cousin-St, 20.48,26.77,17.65,9.81,21.66,1.61,0.00,0.35,0.00,0 .95,0.32,0.40,0.00
Pagane cousin-Z, 23.01,25.05,19.80,9.44,17.43,1.66,0.58,0.52,0.00,0 .90,0.99,0.62,0.00