Does Dodecad K12 or Gedrosia K12 calculator make more sense ?

**Zoro** · 10-02-2021, 02:08 PM

Originally Posted by rothaer

It is a very interesting question what this Gedrosia component at all is.

If you look at the distribution map in the OP it does not fit to a steppe related thing. Because you have that component in Western Europe and 0% of that in Poland and Belarus. But these pops are known to have a big proportion of steppe ancestry.
So whatever is depicted by this Gedrosia component (might it be an arbitrarily and erronous chosen component?) the question remains by what migration it got that distribution.

Bottom line is it doesn’t make sense no matter what excuse we try to make to justify it.

What I just proved is even if Gedrosia percentages (no reason to believe E. Eurasian or any other component is accurate either. Chances if one is off others are off too) or any other admixture percentages in a calculator are wrong the public will have no idea that the calculator is flawed as long as the public sees that the modelling in oracles models them with their ethnic group. This applies to Eurogenes and G25 also.

You may ask how the calculator is able to model me or cluster me with my ethnic group if the calculator whether it is Dodecad Eurogenes or G25, if admixture percentages are off ?

ANSWER: As long as all people from your country all get similar wrong percentages of Gedrosia, E. Asian, W. Asian, African like you then oracles will model you with your countrymen.

That’s how Iranians and Kurds are still modelled with Iranians and Kurds in GEDmatch or G25 with silly E. Asian because all Iranians and Kurds get similar silly E. Asian

That’s how British are modeled with British in this calculator oracles even if they have silly Gedrosian percentages because all British have the same wrong Gedrosia percentages

CONCLUSION: Don’t ever judge a calculator to be good because oracles model you with your countrymen because the calculator admixture percentages can be wrong and you would never know it.
Conversely don’t judge calculators such as GedrosiaDNA project to be bad just because oracles don’t model you with your countrymen. It could be that spreadsheet doesn’t have enough populations, but admixture percentages can still be better than other calculators

**~~Komintasavalta~~** · 10-02-2021, 03:13 PM

Originally Posted by Zoro

I like how you think out of the bix but I’m not sure I follow what you’re saying because the Q matrix is admixture proportion matrix which changes based on K. For ex at K=4:

0.15 0.15 0.5 0.2

Whereas FST matrix is the distances between components (column values). It’s not SNP weight . So how do you combine those two different matrices. Can you give example based on first couple rows of Q matrix

You can use matrix multiplication:

Code:

t=read.csv("https://pastebin.com/raw/UY1Em6qW",r=1)/100 # K13 original

fst=as.dist(read.csv(text=",North_Atlantic,Baltic,West_Med,West_Asian,East_Med,Red_Sea,South_Asian,East_Asian,Siberian,Amerindian,Oceanian,Northeast_African,Sub-Saharan
North_Atlantic,,,,,,,,,,,,,
Baltic,19,,,,,,,,,,,,
West_Med,28,36,,,,,,,,,,,
West_Asian,26,32,36,,,,,,,,,,
East_Med,26,35,28,21,,,,,,,,,
Red_Sea,52,62,50,48,39,,,,,,,,
South_Asian,64,65,76,57,60,82,,,,,,,
East_Asian,114,114,122,110,111,127,76,,,,,,
Siberian,111,111,123,109,112,130,83,56,,,,,
Amerindian,138,137,154,138,144,161,120,113,105,,,,
Oceanian,179,181,187,177,176,191,146,166,177,217,,,
Northeast_African,122,127,124,116,108,121,113,145,151,185,203,,
Sub-Saharan,146,150,150,140,135,141,133,164,170,204,220,41,",r=1))/1000
t2=as.matrix(t)%*%as.matrix(fst)

sort(as.matrix(dist(t2))[,"Mari"])

For example multiplying by the FST matrix moves Maris closer to Central Asians and South Asians but further from Europeans and Siberians, because it causes the distances between Eurasian populations to be largely determined by the Mongoloid-Caucasoid axis. It moves Maris closer to Turkmens and further from Kets and Selkups, which matches the results of f2. However I don't know if it's the right method to account for FST, because it sometimes gives weird results. For example it moves Maris closer to Balochi and Makrani than to Estonians, and it also moves Maris closer to Jordanians than to Bulgarians:

Code:

library(tidyverse)
library(ggforce)

k13=read.csv("https://pastebin.com/raw/aLBEQ2cu",r=1,check=F)/100
f2=read.csv("https://drive.google.com/uc?export=download&id=1qnXblYFWLFnOiEj-NbjCVkHcGIsGe64R",r=1)
# g25=read.csv("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y",r=1) # modern averages scaled

k13fst=as.dist(read.csv(text=",North_Atlantic,Baltic,West_Med,West_Asian,East_Med,Red_Sea,South_Asian,East_Asian,Siberian,Amerindian,Oceanian,Northeast_African,Sub-Saharan
North_Atlantic,,,,,,,,,,,,,
Baltic,19,,,,,,,,,,,,
West_Med,28,36,,,,,,,,,,,
West_Asian,26,32,36,,,,,,,,,,
East_Med,26,35,28,21,,,,,,,,,
Red_Sea,52,62,50,48,39,,,,,,,,
South_Asian,64,65,76,57,60,82,,,,,,,
East_Asian,114,114,122,110,111,127,76,,,,,,
Siberian,111,111,123,109,112,130,83,56,,,,,
Amerindian,138,137,154,138,144,161,120,113,105,,,,
Oceanian,179,181,187,177,176,191,146,166,177,217,,,
Northeast_African,122,127,124,116,108,121,113,145,151,185,203,,
Sub-Saharan,146,150,150,140,135,141,133,164,170,204,220,41,",r=1))/1000

pop=intersect(rownames(f2),rownames(k13))
# pop=intersect(rownames(g25),rownames(k13))
k13=k13[pop,]
f2=f2[pop,pop]
# g25=g25[pop,]

k13mult=as.matrix(k13)%*%as.matrix(k13fst)
xy=data.frame(x=rank(f2[,"Mari"]),y=rank(as.matrix(dist(k13))[,"Mari"]))
# xy=data.frame(x=rank(as.matrix(dist(g25))[,"Mari"]),y=rank(as.matrix(dist(k13mult))[,"Mari"]))

xy$k=as.factor(cutree(hclust(as.dist(f2)),16))
# xy$k=as.factor(cutree(hclust(dist(g25)),16))

ggplot(xy,aes(x,y))+
ggforce::geom_mark_hull(aes(color=k,fill=k),concavity=1000,radius=unit(.15,"cm"),expand=unit(.15,"cm"),alpha=.2,size=.15)+
geom_abline(linetype="dashed",color="gray80",size=.3)+
geom_point(aes(color=k),size=.5)+
geom_text(aes(color=k),label=rownames(xy),size=2,vjust=-.7)+
scale_x_continuous(breaks=seq(1,200,10),expand=expansion(mult=c(.04,.04)))+
scale_y_continuous(breaks=seq(1,200,10),expand=expansion(mult=c(.04,.04)))+
scale_fill_manual(values=rainbow_hcl(nlevels(xy$k),90,60))+
scale_color_manual(values=rainbow_hcl(nlevels(xy$k),90,60))+
labs(x="Rank of f2 distance to Mari",y="Rank of K13 distance to Mari, not multiplied by FST")+
theme(
  axis.text=element_text(size=6),
  axis.text.y=element_text(angle=90,vjust=1,hjust=.5),
  axis.ticks=element_blank(),
  axis.ticks.length=unit(0,"cm"),
  axis.title=element_text(size=8),
  legend.position="none",
  panel.background=element_rect(fill="white"),
  panel.border=element_rect(color="gray85",fill=NA,size=.6),
  panel.grid.major=element_line(color="gray85",size=.2),
  plot.background=element_rect(fill="white"),
  plot.subtitle=element_text(size=7),
  plot.title=element_text(size=11)
)

ggsave("1.png",w=6,h=6)

However when you multiply the matrix of admixture percentages by the FST matrix, it makes a global PCA based on the K13 spreadsheet have the conventional shape where on PC1 and PC2, the other major cline is between Africans and Europeans:

If you don't multiply by FST, then the other major cline on PC1 and PC2 is between Europeans and South Asians instead:

**Zoro** · 10-02-2021, 03:52 PM

Originally Posted by Komintasavalta

You can use matrix multiplication:

For example it moves Maris relatively closer to Central Asians and South Asians and further from Europeans, because it causes the distances between Eurasian populations to be largely determined by the Mongoloid-Caucasoid axis. It moves Maris closer to Turkmens and further from Kets and Selkups, which matches the results of f2. However I don't know if it's the right method to account for FST, because it sometimes gives weird results. For example it moves Maris closer to Balochi and Makrani than to Estonians, and it also moves Maris closer to Jordanians than to Scots:

However when you multiply the matrix of admixture percentages by the FST matrix, it makes a global PCA based on the K13 spreadsheet have the conventional shape where on PC1 and PC2, the other major cline is between Africans and Europeans:

[]

I didn’t mean what code you use, i meant what logic do you have for doing that. The 1st pca you posted didn’t make sense to me in either mode looking at the distance Balochi-Iranian vs Balochi-Caucasians or Arabs. The last PCA you posted probably makes more sense

**Zoro** · 10-02-2021, 03:53 PM

…….

**~~Komintasavalta~~** · 10-02-2021, 05:15 PM

Originally Posted by Zoro

I didn’t mean what code you use, i meant what logic do you have for doing that. The 1st pca you posted didn’t make sense to me in either mode looking at the distance Balochi-Iranian vs Balochi-Caucasians or Arabs. The last PCA you posted probably makes more sense

I'm not sure if it's the correct way to account for FST, but I think it makes sense at least for making a PCA based on the datasheet of an admixture calculator. For example if you make a PCA of European populations from K13 updated without multiplying by FST, Tatars plot between Finns and Caucasians:

But after multiplying by FST, more weight is given to differences in Mongoloid admixture, and PC1 ends up differentiating populations based on the amount of Mongoloid ancestry. (However now it no longer makes sense to make a biplot that shows the loadings of the variables of the PCA, because the matrix that the PCA is based on no longer represents the percentages of the admixture components.)

**Hektor12** · 10-02-2021, 05:19 PM

Originally Posted by rothaer

So whatever is depicted by this Gedrosia component (might it be an arbitrarily and erronous chosen component?) the question remains by what migration it got that distribution.

I find the answer in the R1a-R1b question.

**Zoro** · 10-02-2021, 06:32 PM

Originally Posted by Komintasavalta

I'm not sure if it's the correct way to account for FST, but I think it makes sense at least for making a PCA based on the datasheet of an admixture calculator. For example if you make a PCA of European populations from K13 updated without multiplying by FST, Tatars plot between Finns and Caucasians:

But after multiplying by FST, more weight is given to differences in Mongoloid admixture, and PC1 ends up differentiating populations based on the amount of Mongoloid ancestry. (However now it no longer makes sense to make a biplot that shows the loadings of the variables of the PCA, because the matrix that the PCA is based on no longer represents the percentages of the admixture components.)
]

I like the original one without multiplying with FST. I don’t see logic with FST multiplication but I don’t even understand how you can multiply admixture percentages with FST distances between components doesn’t make sense to me unless I misunderstood you

**~~Komintasavalta~~** · 10-02-2021, 07:14 PM

Originally Posted by Zoro

I like the original one without multiplying with FST. I don’t see logic with FST multiplication but I don’t even understand how you can multiply admixture percentages with FST distances between components doesn’t make sense to me unless I misunderstood you

It uses matrix multiplication (`%*%` operator in R):

Another example: In K13 original, Dolgan has 75% Siberian and Dai has 90% East Asian, but without multiplying by FST, Dolgan are further from Dai than from many SSAs. Without multiplying by FST, Burusho is much closer to Dolgans than Gujarati is, because Burusho has 6% Siberian and Gujarati has 1%, but after multiplying by FST, Gujarati moves to rank 66 and Burusho moves to rank 55. However there's also something wrong with how after multiplying by FST, Dolgans become closer to Ethiopian_Tigray than to most Europeans.

**~~Komintasavalta~~** · 10-02-2021, 07:42 PM

Actually I think I now found a better way to account for FST, which is to first do multidimensional scaling on the FST matrix, and to then multiply the matrix of component percentages with the MDS matrix:

Code:

t=read.csv("https://pastebin.com/raw/UY1Em6qW",r=1)/100 # K13 original

fst=as.dist(read.csv(text=",North_Atlantic,Baltic,West_Med,West_Asian,East_Med,Red_Sea,South_Asian,East_Asian,Siberian,Amerindian,Oceanian,Northeast_African,Sub-Saharan
North_Atlantic,,,,,,,,,,,,,
Baltic,19,,,,,,,,,,,,
West_Med,28,36,,,,,,,,,,,
West_Asian,26,32,36,,,,,,,,,,
East_Med,26,35,28,21,,,,,,,,,
Red_Sea,52,62,50,48,39,,,,,,,,
South_Asian,64,65,76,57,60,82,,,,,,,
East_Asian,114,114,122,110,111,127,76,,,,,,
Siberian,111,111,123,109,112,130,83,56,,,,,
Amerindian,138,137,154,138,144,161,120,113,105,,,,
Oceanian,179,181,187,177,176,191,146,166,177,217,,,
Northeast_African,122,127,124,116,108,121,113,145,151,185,203,,
Sub-Saharan,146,150,150,140,135,141,133,164,170,204,220,41,",r=1))/1000

mds=cmdscale(fst,ncol(as.matrix(fst))-1)
t2=as.data.frame(as.matrix(t)%*%mds)
sort(as.matrix(dist(t2))[,"Selkup"])

Then Dolgans remain further from Ethiopians than from Europeans:

In the images below, the second dimension of the MDS plot differentiates Americans from Australo-Melanesians, because the biggest FST distance in K13 is between the Oceanian and Sub-Saharan components (.220), and the second biggest distance is between the Oceanian and Amerindian components (.217):

Now also the correlation with f2 distance becomes better, so most populations are close to the diagonal in the plot below. On the right side of the diagonal, there are populations that have drift which K13 doesn't account for, like Kalashes, Orcadians, and Chukchi. On the left side of the diagonal, there are mixed populations with low driftedness, like some Central Asians and Hungarians.