15
When you have a numeric matrix with three columns where the values of the columns add up to a constant on each row, and where there are no negative values, it is possible to visualize the matrix as a ternary plot, where the points within the matrix are drawn inside an equilateral triangle: https://en.wikipedia.org/wiki/Ternary_plot. Basically you can draw an equilateral triangle centered in the origin, with a vector pointing from the origin to each corner of the triangle, and you can then calculate the coordinates of the points as a linear combination of the vectors. Because the rows of the matrix add up to a constant, there is a one-to-one correspondence between coordinates within the triangle and the values in the matrix, since the value of the third column of the matrix is always equal to the values of the first and second column added together and subtracted from the constant.
It is possible to extend the concept of a ternary plot in order to draw a square plot for a matrix with four columns, to draw a pentagon-shaped plot for a matrix with five columns, and so on. However then there is no longer a one-to-one correspondence between points within the polygon and coordinates in the matrix, because for example within a square plot, a point in the middle of the plot can either have 25% of all four components or 50% of two opposite components.
I now selected almost all modern European samples from the 1240K+HO dataset, except I excluded duplicate samples, I excluded one sample from each pair of samples with PI_HAT of .3 or above, and I only included at most 16 samples per population. I then ran ADMIXTURE at each K value from 3 to 8, and I visualized the results as polygonal diagrams.
In the images below, I reordered the admixture components so that I always placed the Kalmyk component at the top of the diagram, because there were no Nenets samples in the dataset I used, so I considered Kalmyks to be the racially purest Europeans. I placed Northern Europeans on the right side of Kalmyks, because there is a cline from Northern Europeans to Kalmyks in Northeastern Europe, and I placed North Caucasians on the left side of Kalmyks, because Nogais are intermediate between Caucasians and Kalmyks.
The image below shows population averages from the same ADMIXTURE runs visualized as heatmaps. The clustering is based on a matrix where the columns of each run have been joined into a single wide matrix.
At K=3, the middle component seems like a WHG-like component, because its proportion is the highest in Basques and Lithuanians. The left component is maximal in Kalmyks, but it is also influenced by VURians and Nogais, so even Udmurts have 42% of the left component. Nogais are the only population that has a large proportion of both the first and third components. Nogais from Stavropol are from north of North Caucasus, and Nogais from Astrakhan are from between Kalmykia and Kazakhstan. Compared to them, Nogais from Karachay-Cherkessia (North Caucasus) are closer to Caucasians and less Mongoloid.
At K=4, the middle component breaks off into a Northern European component which is maximal in Estonians and Lithuanians and to a wog component which is maximal in Sardinians. However even Greeks still have 35% of the Caucasian component. Now the proportion of the Mongoloid component also decreases from 42% to 34% in Udmurts.
At K=5, the northern European component splits off into a mysterious ghost component whose proportion is the highest in Arkhangelsk Russians, Gagauzes, and Moldovans. At K=5, Kazan Tatars still have 14% of the Caucasian component but Chuvashes only have 2%. Bashkirs have 8% of the Caucasian component, 36% of the Mongoloid component, and 56% of the Northern European component. However the Bashkir samples are from Jeong et al. 2019 which included both northern and southern Bashkirs, and the southern Bashkir samples had much higher Mongoloid ancestry.
At K=6, the wog component splits off into a Sardinian component and a Maltese component. The Maltese component is more rare, and it has a high percentage only in Maltese, Ashkenazis, and Sicilians. Caucasians were overrepresented in this run, so the Caucasian component also splits into two different components at K=6.
At K=8, a Uralic component that is maximal in Vepsians appears. If these runs would have included more samples of non-Finnic Finno-Permic populations like Saami, Maris, or Komis, the Uralic component might have become more Mongoloid, or it would have appeared at an earlier K value.
Download required data and software:
1240K+HO dataset: https://reich.hms.harvard.edu/allen-...cient-dna-data
ADMIXTURE: https://github.com/NovembreLab
Binaries for PLINK 1.9: https://www.cog-genomics.org/plink2/
Compile EIGENSOFT from source: https://reich.hms.harvard.edu/software
Mac binaries for EIGENSOFT 7.2.1: https://drive.google.com/file/d/1H8k...ew?usp=sharing
Mac binaries for an old fork of EIGENSOFT: https://github.com/chrchang/eigensoft
Download the 1240K+HO dataset and run ADMIXURE:
Generate polygonal diagrams:Code:curl -LsO reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.tar;tar -xf v44.3_HO_public.tar f=v44.3_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam) x=euro5 printf %s\\n Albanian Basque Basque.SDG Belarusian Bulgarian Cretan.DG Croatian Czech English French French.SDG Greek Icelandic Italian_North Italian_South Lithuanian Maltese Moldavian Norwegian Norwegian.DG Orcadian Orcadian.SDG Polish.DG Romanian Russian Russian.SDG Russian_Archangelsk_Krasnoborsky Russian_Archangelsk_Leshukonsky Russian_Archangelsk_Pinezhsky Sardinian Scottish Sicilian Spanish Spanish_North Ukrainian Ukrainian_North Besermyan Estonian Finnish Finnish.DG Hungarian Karelian Mordovian Saami.DG Udmurt Veps Chuvash Gagauz Tatar_Kazan Tatar_Mishar Abazin Adygei Adygei.SDG Avar Balkar Chechen Circassian Darginian Ingushian Kabardinian Kaitag Karachai Kumyk Lak Lezgin Lezgin.DG Ossetian Tabasaran Bashkir Jew_Ashkenazi Kalmyk Nogai_Astrakhan Nogai_Karachay_Cherkessia Nogai_Stavropol>$x.pop sed 1d v44.3_HO_public.anno|sort -t$'\t' -rnk15|awk -F\\t '!a[$3]++{print$2,$8}'|awk 'NR==FNR{a[$0];next}$2 in a' $x.pop ->$x.temp.pick plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.temp.pick v44.3_HO_public.fam) --make-bed --out $x.temp plink --allow-no-sex --bfile $x.temp --genome --out $x awk 'FNR>1&&$10>=.25{print$2<$4?$2:$4}' $x.genome|awk 'NR==FNR{a[$0];next}!($1 in a)' - $x.temp.pick>$x.pick plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x plink --allow-no-sex --bfile $x --indep-pairwise 50 10 .05 --out $x plink --bfile $x --extract $x.prune.in --make-bed --out $x.pruned tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1][i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i][j]/n[i]);print o}}' "FS=${1-$'\t'}") for k in {3..8};do admixture -j4 -C .1 $x.pruned.bed $k;paste -d' ' <(awk 'NR==FNR{a[$1]=$2;next}{print$2,a[$2]}' $x.pick $x.pruned.fam) $x.pruned.$k.Q>$x.$k;cut -d' ' -f2- $x.$k|tav \ >$x.$k.ave;done
Use ComplexHeatmap to combine heatmaps for different K values (https://jokergoo.github.io/ComplexHe...eference/book/):Code:library(tidyverse) library(ggforce) library(ggrepel) for(n in c(3,4,5,6,7,8)){ t=read.table(paste0("euro5.",n)) rownames(t)=paste0(t[,2],":",t[,1]) t=t[,-c(1,2)] columnorder=list(c(2,1,3),c(4,3,2,1),c(2,5,4,3,1),c(4,1,2,3,6,5),c(1,7,5,3,2,6,4),c(2,3,7,4,1,8,5,6)) t=t[,columnorder[[n-2]]] corners=sapply(c(sin,cos),function(x)head(x(seq(0,2,length.out=n+1)*pi),-1)) corners=corners*min(2/diff(apply(corners,2,range))) corners[,2]=corners[,2]-mean(range((corners[,2]))) xy=as.data.frame(as.matrix(t)%*%corners) grid=as.data.frame(rbind(cbind(corners,rbind(corners[-1,],corners[1,])),cbind(corners,matrix(apply(corners,2,mean),ncol=2,nrow=n,byrow=T)))) pop=sub(":.*","",rownames(xy)) pop=sub("\\.(DG|SDG|SG|WGA)","",pop) centers=aggregate(xy,by=list(pop),mean) xy$pop=pop set.seed(1488) color=as.factor(sample(seq(1,length(unique(xy$pop))))) cl=rbind(c(60,80),c(25,95),c(30,70),c(70,50),c(60,100),c(20,50),c(15,40)) hues=max(ceiling(length(color)/nrow(cl)),2) pal1=as.vector(apply(cl,1,function(x)hcl(head(seq(15,375,length=hues+1),-1),x[1],x[2]))) pal2=as.vector(apply(cl,1,function(x)hcl(head(seq(15,375,length=hues+1),-1),ifelse(x[2]>=60,.5*x[1],.1*x[1]),ifelse(x[2]>=60,.2*x[2],95)))) xy$V1=xy$V1+runif(nrow(xy))/1e3 xy$V2=xy$V2+runif(nrow(xy))/1e3 lims=apply(corners,2,range)+c(-.08,.08) ggplot(xy,aes(x=V1,y=V2))+ geom_segment(data=grid,aes(x=V1,y=V2,xend=V3,yend=V4),color="gray85",size=.3)+ geom_voronoi_tile(aes(group=0,fill=color[as.factor(pop)],color=color[as.factor(pop)]),size=.07,max.radius=.055)+ geom_label_repel(data=centers,aes(x=V1,y=V2,label=Group.1,color=color,fill=color),max.overlaps=Inf,point.size=0,size=2.3,alpha=.8,label.r=unit(.1,"lines"),label.padding=unit(.1,"lines"),label.size=.1,box.padding=0,segment.size=.3)+ coord_fixed(xlim=lims[,1],ylim=lims[,2],expand=F)+ scale_fill_manual(values=pal1)+ scale_color_manual(values=pal2)+ theme( axis.text=element_blank(), axis.ticks=element_blank(), axis.title=element_blank(), legend.position="none", panel.background=element_rect(fill="white") ) ggsave(paste0(n,".png"),width=7,height=7) }
Code:library(ComplexHeatmap) library(circlize) library(colorspace) library(vegan) kvals=c(3,4,5,6,7,8) # columnorder=lapply(kvals,seq) columnorder=list(c(2,1,3),c(4,3,2,1),c(2,5,4,3,1),c(4,1,2,3,6,5),c(1,7,5,3,2,6,4),c(2,3,7,4,1,8,5,6)) mats=sapply(1:length(kvals),function(i){ t=100*read.table(paste0("euro5.",kvals[i],".ave"),row.names=1)[,columnorder[[i]]] rownames(t)=sub("Cherkessia","Cher",sub("Russian_Archangelsk_","Rus_Arch_",rownames(t))) data.frame(aggregate(t,list(sub("\\.(DG|SDG|SG|WGA)|_1|_2","",row.names(t))),mean),row.names=1) }) png("a.png",w=6000,h=5000,res=144) maps=sapply(kvals,function(k){ mat=as.matrix(mats[match(k,kvals)][[1]]) Heatmap( mat, show_heatmap_legend=F, show_column_names=F, show_row_names=F, clustering_distance_rows="euclidean", width=ncol(mat)*unit(30,"pt"), height=nrow(mat)*unit(30,"pt"), row_dend_width=unit(200,"pt"), cluster_columns=F, cluster_rows=reorder(hclust(dist(do.call(cbind,mats))),-mats[[2]][,2]-2*mats[[2]][,1]), column_title=paste0("K=",k),column_title_gp=gpar(fontsize=24), right_annotation=rowAnnotation(text1=anno_text(gt_render(rownames(mat),padding=unit(c(2,2,2,2),"mm")),just="left",location=unit(0,"npc"),gp=gpar(fontsize=17))), col=colorRamp2(seq(0,100,length.out=7),hex(HSV(c(210,210,130,60,40,20,0),c(0,rep(.5,6)),1))), cell_fun=function(j,i,x,y,w,h,fill)grid.text(sprintf("%.0f",mat[i,j]),x,y,gp=gpar(fontsize=15)) ) }) draw(Reduce(`+`,maps)) dev.off() system("mogrify -gravity center -trim -border 16 -bordercolor white a.png")
Bookmarks