Komintasavalta
02-20-2021, 12:19 PM
I used data from the file "Global25 pop averages modern scaled": https://eurogenes.blogspot.com/2019/07/getting-most-out-of-global25_12.html.
The numbers displayed in the heatmap are the same Euclidean distances that are shown by Vahaduo, but I multiplied them by 100 so I could make them fit the cells of the heatmap better, and because integers are nice.
Distances based on G25 have to be taken with a grain of salt, but based on the distances shown in the image below, Kola Saami are closer to Komis than to other Saami. The population that is closest to Komis are Kola Saami. Bashkirs are far from every population. Maris are the closest to non-Kola Saami. Non-Kola Saami are closest to Kola Saami, followed by Udmurts. Kazan Tatars are closest to Mishars, followed by Besermyans.
https://i.ibb.co/HPTdmNH/g25-euclidean-northeast-heatmap.png
brew install R
R -e 'install.packages(c("pheatmap","RColorBrewer"),repos="https://cloud.r-project.org")'
curl -Ls 'drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y'>modernave
awk -F, 'NR==FNR{a[$0];next}$1 in a' <(printf %s\\n Bashkir Besermyan Chuvash Estonian Finnish Karelian Komi Mari Mordovian Russian_Kostroma Russian_Pinega Saami Saami_Kola Tatar_Kazan Tatar_Mishar Udmurt Vepsian) modernave>selected
R -e 'library("pheatmap");library("RColorBrewer");t<-read.csv("selected",header=F,row.names=1);t2<-100*as.matrix(dist(t,upper=T));diag(t2)<-NA;
pheatmap(t2,filename="output.png",main="G25 Euclidean distances multiplied by 100",cellwidth=12,cellheight=12,fontsize=9,border_colo r=NA,
display_numbers=T,number_format="%.0f",fontsize_number=7,number_color="black",rev(colorRampPalette(brewer.pal(11,"Spectral"))(256)))'
Here are also some Siberian population averages:
https://i.ibb.co/fXb8RQc/g25-euclidean-siberia-heatmap.png
Here are 16 random populations:
https://i.ibb.co/dKNm042/g25-euclidean-random-heatmap.png
Note that the heatmaps above use different ranges of numbers for the color scale. You can give an argument like `breaks=seq(0,14.88,14.88/256)` for `pheatmap` to use a fixed scale from 0 to 14.88.
It's easy to calculate Euclidean distances in R:
$ Rscript -e 'round(dist(read.csv("modernave",row.names=1,header=T)[c("Chuvash","Khanty","Komi","Mari","Nenets","Udmurt"),],upper=T),3)'
Chuvash Khanty Komi Mari Nenets Udmurt
Chuvash 0.205 0.073 0.056 0.302 0.049
Khanty 0.205 0.247 0.173 0.112 0.186
Komi 0.073 0.247 0.125 0.343 0.067
Mari 0.056 0.173 0.125 0.270 0.082
Nenets 0.302 0.112 0.343 0.270 0.286
Udmurt 0.049 0.186 0.067 0.082 0.286
$ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);round(head(sort(as.matrix(d ist(t))["Chuvash",]),8),3)'
Chuvash Besermyan Udmurt Mari Tatar_Kazan Saami Komi Saami_Kola
0.000 0.048 0.049 0.056 0.064 0.071 0.073 0.077
$ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);p<-t["Chuvash",];head(round(sort(apply(t,1,function(x)sqrt(sum((x-p)^2)))),3),8)'
Chuvash Besermyan Udmurt Mari Tatar_Kazan Saami Komi Saami_Kola
0.000 0.048 0.049 0.056 0.064 0.071 0.073 0.077
You can also use awk:
$ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5","$1}' <(grep Chuvash, modernave) modernave|sort -n|awk -F, '{printf"%.03f %s\n",$1,$2}'|sed s/^0//|head -n8
.000 Chuvash
.048 Besermyan
.049 Udmurt
.056 Mari
.064 Tatar_Kazan
.071 Saami
.073 Komi
.077 Saami_Kola
Or use this Ruby script:
$ cat ~/bin/eud
#!/usr/bin/env ruby -roptparse
opt={}
OptionParser.new{|x|
x.on("-m NUM",Integer){|y|opt[:m]=y}
x.on("-f NUM",Integer){|y|opt[:f]=y}
}.parse!
a=IO.readlines(ARGV[0]).map{|l|x,*y=l.chomp.split(",");[x,y.map(&:to_f)]}
puts IO.readlines(ARGV[1]).map{|l|
x,*y=l.chomp.split(",")
y.map!(&:to_f)
d=a.reject{|z|z[0]==x}.map{|z|[z[1].map.with_index{|v,i|(v-y[i])**2}.sum**0.5,z[0]]}.sort_by(&:first)
d=d.take(opt[:m])if opt[:m]
"Distance to: #{x}\n"+d.map{|x,y|("%.#{opt[:f]||3}f"%x).sub(/^0/,"")+" "+y}*"\n"
}*"\n\n"
$ eud -m8 modernave <(grep Chuvash modernave)
Distance to: Chuvash
.048 Besermyan
.049 Udmurt
.056 Mari
.064 Tatar_Kazan
.071 Saami
.073 Komi
.077 Saami_Kola
.087 Tatar_Mishar
The distances calculated by Vahaduo are also simple Euclidean distances:
https://i.ibb.co/GV3fk2G/20210220151009.jpg
The numbers displayed in the heatmap are the same Euclidean distances that are shown by Vahaduo, but I multiplied them by 100 so I could make them fit the cells of the heatmap better, and because integers are nice.
Distances based on G25 have to be taken with a grain of salt, but based on the distances shown in the image below, Kola Saami are closer to Komis than to other Saami. The population that is closest to Komis are Kola Saami. Bashkirs are far from every population. Maris are the closest to non-Kola Saami. Non-Kola Saami are closest to Kola Saami, followed by Udmurts. Kazan Tatars are closest to Mishars, followed by Besermyans.
https://i.ibb.co/HPTdmNH/g25-euclidean-northeast-heatmap.png
brew install R
R -e 'install.packages(c("pheatmap","RColorBrewer"),repos="https://cloud.r-project.org")'
curl -Ls 'drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y'>modernave
awk -F, 'NR==FNR{a[$0];next}$1 in a' <(printf %s\\n Bashkir Besermyan Chuvash Estonian Finnish Karelian Komi Mari Mordovian Russian_Kostroma Russian_Pinega Saami Saami_Kola Tatar_Kazan Tatar_Mishar Udmurt Vepsian) modernave>selected
R -e 'library("pheatmap");library("RColorBrewer");t<-read.csv("selected",header=F,row.names=1);t2<-100*as.matrix(dist(t,upper=T));diag(t2)<-NA;
pheatmap(t2,filename="output.png",main="G25 Euclidean distances multiplied by 100",cellwidth=12,cellheight=12,fontsize=9,border_colo r=NA,
display_numbers=T,number_format="%.0f",fontsize_number=7,number_color="black",rev(colorRampPalette(brewer.pal(11,"Spectral"))(256)))'
Here are also some Siberian population averages:
https://i.ibb.co/fXb8RQc/g25-euclidean-siberia-heatmap.png
Here are 16 random populations:
https://i.ibb.co/dKNm042/g25-euclidean-random-heatmap.png
Note that the heatmaps above use different ranges of numbers for the color scale. You can give an argument like `breaks=seq(0,14.88,14.88/256)` for `pheatmap` to use a fixed scale from 0 to 14.88.
It's easy to calculate Euclidean distances in R:
$ Rscript -e 'round(dist(read.csv("modernave",row.names=1,header=T)[c("Chuvash","Khanty","Komi","Mari","Nenets","Udmurt"),],upper=T),3)'
Chuvash Khanty Komi Mari Nenets Udmurt
Chuvash 0.205 0.073 0.056 0.302 0.049
Khanty 0.205 0.247 0.173 0.112 0.186
Komi 0.073 0.247 0.125 0.343 0.067
Mari 0.056 0.173 0.125 0.270 0.082
Nenets 0.302 0.112 0.343 0.270 0.286
Udmurt 0.049 0.186 0.067 0.082 0.286
$ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);round(head(sort(as.matrix(d ist(t))["Chuvash",]),8),3)'
Chuvash Besermyan Udmurt Mari Tatar_Kazan Saami Komi Saami_Kola
0.000 0.048 0.049 0.056 0.064 0.071 0.073 0.077
$ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);p<-t["Chuvash",];head(round(sort(apply(t,1,function(x)sqrt(sum((x-p)^2)))),3),8)'
Chuvash Besermyan Udmurt Mari Tatar_Kazan Saami Komi Saami_Kola
0.000 0.048 0.049 0.056 0.064 0.071 0.073 0.077
You can also use awk:
$ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5","$1}' <(grep Chuvash, modernave) modernave|sort -n|awk -F, '{printf"%.03f %s\n",$1,$2}'|sed s/^0//|head -n8
.000 Chuvash
.048 Besermyan
.049 Udmurt
.056 Mari
.064 Tatar_Kazan
.071 Saami
.073 Komi
.077 Saami_Kola
Or use this Ruby script:
$ cat ~/bin/eud
#!/usr/bin/env ruby -roptparse
opt={}
OptionParser.new{|x|
x.on("-m NUM",Integer){|y|opt[:m]=y}
x.on("-f NUM",Integer){|y|opt[:f]=y}
}.parse!
a=IO.readlines(ARGV[0]).map{|l|x,*y=l.chomp.split(",");[x,y.map(&:to_f)]}
puts IO.readlines(ARGV[1]).map{|l|
x,*y=l.chomp.split(",")
y.map!(&:to_f)
d=a.reject{|z|z[0]==x}.map{|z|[z[1].map.with_index{|v,i|(v-y[i])**2}.sum**0.5,z[0]]}.sort_by(&:first)
d=d.take(opt[:m])if opt[:m]
"Distance to: #{x}\n"+d.map{|x,y|("%.#{opt[:f]||3}f"%x).sub(/^0/,"")+" "+y}*"\n"
}*"\n\n"
$ eud -m8 modernave <(grep Chuvash modernave)
Distance to: Chuvash
.048 Besermyan
.049 Udmurt
.056 Mari
.064 Tatar_Kazan
.071 Saami
.073 Komi
.077 Saami_Kola
.087 Tatar_Mishar
The distances calculated by Vahaduo are also simple Euclidean distances:
https://i.ibb.co/GV3fk2G/20210220151009.jpg