Results 1 to 7 of 7

Thread: Heatmaps of G25 Euclidean distances with R

  1. #1
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    5 Not allowed!

    Default Heatmaps of G25 Euclidean distances with R

    I used data from the file "Global25 pop averages modern scaled": https://eurogenes.blogspot.com/2019/...obal25_12.html.

    The numbers displayed in the heatmap are the same Euclidean distances that are shown by Vahaduo, but I multiplied them by 100 so I could make them fit the cells of the heatmap better, and because integers are nice.

    Distances based on G25 have to be taken with a grain of salt, but based on the distances shown in the image below, Kola Saami are closer to Komis than to other Saami. The population that is closest to Komis are Kola Saami. Bashkirs are far from every population. Maris are the closest to non-Kola Saami. Non-Kola Saami are closest to Kola Saami, followed by Udmurts. Kazan Tatars are closest to Mishars, followed by Besermyans.



    Code:
    brew install R
    R -e 'install.packages(c("pheatmap","RColorBrewer"),repos="https://cloud.r-project.org")'
    curl -Ls 'drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y'>modernave
    awk -F, 'NR==FNR{a[$0];next}$1 in a' <(printf %s\\n Bashkir Besermyan Chuvash Estonian Finnish Karelian Komi Mari Mordovian Russian_Kostroma Russian_Pinega Saami Saami_Kola Tatar_Kazan Tatar_Mishar Udmurt Vepsian) modernave>selected
    R -e 'library("pheatmap");library("RColorBrewer");t<-read.csv("selected",header=F,row.names=1);t2<-100*as.matrix(dist(t,upper=T));diag(t2)<-NA;
    pheatmap(t2,filename="output.png",main="G25 Euclidean distances multiplied by 100",cellwidth=12,cellheight=12,fontsize=9,border_color=NA,
    display_numbers=T,number_format="%.0f",fontsize_number=7,number_color="black",rev(colorRampPalette(brewer.pal(11,"Spectral"))(256)))'
    Here are also some Siberian population averages:



    Here are 16 random populations:



    Note that the heatmaps above use different ranges of numbers for the color scale. You can give an argument like `breaks=seq(0,14.88,14.88/256)` for `pheatmap` to use a fixed scale from 0 to 14.88.

    It's easy to calculate Euclidean distances in R:

    Code:
    $ Rscript -e 'round(dist(read.csv("modernave",row.names=1,header=T)[c("Chuvash","Khanty","Komi","Mari","Nenets","Udmurt"),],upper=T),3)'
            Chuvash Khanty  Komi  Mari Nenets Udmurt
    Chuvash          0.205 0.073 0.056  0.302  0.049
    Khanty    0.205        0.247 0.173  0.112  0.186
    Komi      0.073  0.247       0.125  0.343  0.067
    Mari      0.056  0.173 0.125        0.270  0.082
    Nenets    0.302  0.112 0.343 0.270         0.286
    Udmurt    0.049  0.186 0.067 0.082  0.286
    $ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);round(head(sort(as.matrix(dist(t))["Chuvash",]),8),3)'
        Chuvash   Besermyan      Udmurt        Mari Tatar_Kazan       Saami        Komi  Saami_Kola
          0.000       0.048       0.049       0.056       0.064       0.071       0.073       0.077
    $ Rscript -e 't<-read.csv("modernave",row.names=1,header=T);p<-t["Chuvash",];head(round(sort(apply(t,1,function(x)sqrt(sum((x-p)^2)))),3),8)'
        Chuvash   Besermyan      Udmurt        Mari Tatar_Kazan       Saami        Komi  Saami_Kola
          0.000       0.048       0.049       0.056       0.064       0.071       0.073       0.077
    You can also use awk:

    Code:
    $ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5","$1}' <(grep Chuvash, modernave) modernave|sort -n|awk -F, '{printf"%.03f %s\n",$1,$2}'|sed s/^0//|head -n8
    .000 Chuvash
    .048 Besermyan
    .049 Udmurt
    .056 Mari
    .064 Tatar_Kazan
    .071 Saami
    .073 Komi
    .077 Saami_Kola
    Or use this Ruby script:

    Code:
    $ cat ~/bin/eud
    #!/usr/bin/env ruby -roptparse
    
    opt={}
    OptionParser.new{|x|
      x.on("-m NUM",Integer){|y|opt[:m]=y}
      x.on("-f NUM",Integer){|y|opt[:f]=y}
    }.parse!
    
    a=IO.readlines(ARGV[0]).map{|l|x,*y=l.chomp.split(",");[x,y.map(&:to_f)]}
    
    puts IO.readlines(ARGV[1]).map{|l|
      x,*y=l.chomp.split(",")
      y.map!(&:to_f)
      d=a.reject{|z|z[0]==x}.map{|z|[z[1].map.with_index{|v,i|(v-y[i])**2}.sum**0.5,z[0]]}.sort_by(&:first)
      d=d.take(opt[:m])if opt[:m]
      "Distance to: #{x}\n"+d.map{|x,y|("%.#{opt[:f]||3}f"%x).sub(/^0/,"")+" "+y}*"\n"
    }*"\n\n"
    $ eud -m8 modernave <(grep Chuvash modernave)
    Distance to: Chuvash
    .048 Besermyan
    .049 Udmurt
    .056 Mari
    .064 Tatar_Kazan
    .071 Saami
    .073 Komi
    .077 Saami_Kola
    .087 Tatar_Mishar
    The distances calculated by Vahaduo are also simple Euclidean distances:


  2. #2
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    4 Not allowed!

    Default

    This selects Northeast European populations, takes the four closest rows to each of them from the datasheet for ancient population averages, and makes a dendrogram of all rows:

    curl -Ls 'drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y'>modernave
    curl -Ls 'drive.google.com/uc?export=download&id=1F2rKEVtu8nWSm7qFhxPU6UESQNsmA-sl'>ancientave
    printf %s\\n Bashkir Besermyan Chuvash Finnish Finnish_East Ingrian Karelian Komi Latvian Lithuanian_PA Mari Mordovian Russian_Kostroma Russian_Pinega Russian_Tver Saami Saami_Kola Tatar_Kazan Tatar_Lipka Tatar_Mishar Udmurt Vepsian>pop
    awk -F, 'NR==FNR{a[$0];next}$1 in a' pop modernave>modernselected
    dist()(awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}$1!=""{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5,$1}' "$1" "$2")
    cat modernselected|while read l;do dist <(echo "$l") ancientave|sort -n|head -n4;done|cut -d' ' -f2|awk -F, 'NR==FNR{a[$0];next}$1 in a' - ancientave>ancientselected
    cat {modern,ancient}selected>selected
    R -e 'install.packages("ape",repos="https://cloud.r-project.org")'
    R -e 'library("ape");png("output.png",w=750,h=900);plot(as.phylo(hclust(dist(read.csv("selected",header=F,row.names=1,check.names=F)))),type="phylogram",cex=1.3,no.margin=T,font=1);dev.off()'
    mogrify -trim -border 10 -bordercolor white output.png


  3. #3
    Veteran Member Ajeje Brazorf's Avatar
    Join Date
    Jun 2017
    Last Online
    Today @ 02:41 PM
    Ethnicity
    Italian‏‏‎
    Country
    Italy
    Gender
    Posts
    2,287
    Thumbs Up
    Received: 2,170
    Given: 9

    1 Not allowed!

    Default

    Hierarchical clustering of modern populations, which of these two makes more "sense" to you?

    Scaled
    Spoiler!


    Unscaled
    Spoiler!

  4. #4
    Banned
    Join Date
    Sep 2019
    Last Online
    07-29-2023 @ 05:42 PM
    Location
    --
    Meta-Ethnicity
    --
    Ethnicity
    ---
    Ancestry
    --
    Country
    United States
    Region
    Quebec City
    Y-DNA
    --
    mtDNA
    --
    Taxonomy
    --
    Politics
    --
    Religion
    -+
    Relationship Status
    Single
    Gender
    Posts
    10,090
    Thumbs Up
    Received: 6,244
    Given: 1,444

    1 Not allowed!

    Default

    Interesting but I'm not installing some random packages called : "pheatmap RColorBrewer" that are not in the OpenBSD ports system (dons tin foil hat). I'd have to read the source code first and I don't have time for that. Especially, when there is a website that can use for that -- no it is not the vahaduo website-- (I can contain my browser ) with unveil/pledge and 'secure the settings in my browser' and using a proxy :

    NAME
    unveil - unveil parts of a restricted filesystem view

    SYNOPSIS
    #include

    int
    unveil(const char *path, const char *permissions);

    DESCRIPTION
    The first call to unveil() that specifies a path removes visibility of
    the entire filesystem from all other filesystem-related system calls
    (such as open(2), chmod(2) and rename(2)), except for the specified path
    and permissions.

    The unveil() system call remains capable of traversing to any path in the
    filesystem, so additional calls can set permissions at other points in
    the filesystem hierarchy.

    After establishing a collection of path and permissions rules, future
    calls to unveil() can be disabled by passing two NULL arguments.
    Alternatively, pledge(2) may be used to remove the "unveil" promise.

    The permissions argument points to a string consisting of zero or more of
    the following characters:

    r Make path available for read operations, corresponding to the
    pledge(2) promise "rpath".
    w Make path available for write operations, corresponding to
    the pledge(2) promise "wpath".
    x Make path available for execute operations, corresponding to
    the pledge(2) promise "exec".
    c Allow path to be created and removed, corresponding to the
    pledge(2) promise "cpath".

    A path that is a directory will enable all filesystem access underneath
    path using permissions if and only if no more specific matching unveil()
    exists at a lower level. Directories are remembered at the time of a
    call to unveil(). This means that a directory that is removed and
    recreated after a call to unveil() will appear to not exist.

    Non-directory paths are remembered by name within their containing
    directory, and so may be created, removed, or re-created after a call to
    unveil() and still appear to exist.

    Attempts to access paths not allowed by unveil() will result in an error
    of EACCES when the permissions argument does not match the attempted
    operation. ENOENT is returned for paths for which no unveil()
    permissions qualify. After a process has terminated, lastcomm(1) will
    mark it with the `U' flag if file access was prevented by unveil().

    unveil() use can be tricky because programs misbehave badly when their
    files unexpectedly disappear. In many cases it is easier to unveil the
    directories in which an application makes use of files.

    https://man.openbsd.org/unveil

    The simple Rscript euclidian distances is something I would use though.

    How is your mac doing, BTW (I guess you are using Darwin ? ) :

    A previously undetected piece of malware found on almost 30,000 Macs worldwide is generating intrigue in security circles, which are still trying to understand precisely what it does and what purpose its self-destruct capability serves.

    Once an hour, infected Macs check a control server to see if there are any new commands the malware should run or binaries to execute. So far, however, researchers have yet to observe delivery of any payload on any of the infected 30,000 machines, leaving the malware's ultimate goal unknown. The lack of a final payload suggests that the malware may spring into action once an unknown condition is met.

    Also curious, the malware comes with a mechanism to completely remove itself, a capability that's typically reserved for high-stealth operations. So far, though, there are no signs the self-destruct feature has been used, raising the question why the mechanism exists. Besides those questions, the malware is notable for a version that runs natively on the M1 chip that Apple introduced in November, making it only the second known piece of macOS malware to do so...

    The malware has been found in 153 countries with detections concentrated in the US, UK, Canada, France, and Germany.

    Red Canary, the security firm that discovered the malware, has named it "Silver Sparrow."

    First detected in August of 2020, the Silver Sparrow malware is interesting in several unsettling ways. It uses the macOS Installer Javascript API to launch a bash process to gain a foothold into the user's system, a hitherto-unobserved method for bypassing malware detection. This bash shell is then used to invoke macOS's built-in PlistBuddy tool to create a LaunchAgent which executes a bash script every hour. This is the command and control process, which downloads a JSON file containing (potentially) new instructions.

    Besides the novel installation method, Silver Sparrow is also mysterious in its payload: a single, tiny binary that does nothing but open a window reading "Hello, World!" (in v1, which targets Intel Macs) or "You did it!" (in v2, which is an M1-compatible fat binary). These "bystander binaries" are never executed and appear to be proofs-of-concept or placeholders for future functionality.

    https://it.slashdot.org/story/21/02/...-security-pros

  5. #5
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    3 Not allowed!

    Default

    Quote Originally Posted by JamesBond007 View Post
    Interesting but I'm not installing some random packages
    You can do a dendrogram in plain R like this, but then it cuts off the end of long labels like in the images posted by Ajeje Brazorf:

    Code:
    R -e 'png("output.png",w=700,h=8000);plot(as.dendrogram(hclust(dist(read.csv("Global25_PCA_modern_pop_averages_scaled.txt",header=T,row.names=1)))),horiz=T,yaxt="n");dev.off()'
    `horiz=T` rotates the plot 90 degrees and `yaxt="n"` removes the scale. Run `?plot.dendrogram` in R to see help.

    Quote Originally Posted by Ajeje Brazorf View Post
    Hierarchical clustering of modern populations, which of these two makes more "sense" to you?

    Scaled
    Spoiler!


    Unscaled
    Spoiler!
    You need scaled coordinates to calculate the distances correctly. In the unscaled version you posted, the Amerindian branch splits off from whites before the Baka pygmy branch. And Chuvashes and Maris split off from mainstream Europeans before SSAs.

    In the unscaled version, too little weight is given to the first components which explain most of the variance:

    Last edited by Komintasavalta; 02-21-2021 at 06:56 PM.

  6. #6
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    5 Not allowed!

    Default

    I'm getting more powerful at R. This does a PCA where it uses K-means clustering to assign label colors. It selects the 64 modern population averages that are the closest to the population average of Nenetses.



    Code:
    library("factoextra")
    library("ggplot2")
    library("colorspace")
    
    download.file("https://drive.google.com/uc?export=download&id=1wZr-UOve0KUKo_Qbgeo27m-CQncZWb8y","modernave")
    
    t<-read.csv("modernave",header=T,row.names=1)
    t2<-t[names(head(sort(apply(t,1,function(x)sqrt(sum((x-t["Nenets",])^2)))),65)),]
    k<-hkmeans(t2,16)
    p<-prcomp(t2)
    
    pct<-paste0(colnames(p$x)," (",round(p$sdev/sum(p$sdev)*100,1),"%)")
    
    ggplot(p$x,aes(x=PC2,y=PC1,label=rownames(t2)))+
    theme_dark()+
    geom_text(aes(color=as.factor(k$cluster)),size=2.5,show.legend=F)+
    theme(
      axis.ticks.x=element_blank(),
      axis.ticks.y=element_blank(),
      plot.background=element_rect(fill="gray35"),
      panel.background=element_rect(fill="gray20"),
      panel.grid.major=element_line(color="gray35"),
      panel.grid.minor=element_blank(),
      text=element_text(color="gray10"),
      axis.text=element_text(color="gray10")
    )+
    scale_x_continuous(breaks=seq(-1,1,.1),expand=expansion(mult=.17))+
    scale_y_continuous(breaks=seq(-1,1,.1),expand=expansion(mult=.02))+
    coord_fixed(ratio=p$sdev[1]/p$sdev[2],expand=T)+
    xlab(pct[2])+ylab(pct[1])+
    scale_color_discrete_qualitative(palette="Set 3")
    ggsave("output.png")
    Last edited by Komintasavalta; 02-22-2021 at 11:42 PM.

  7. #7
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    3 Not allowed!

    Default

    Here's the 64 closest ancient individuals to the Ymyyakhtakh population average. Ymy is in a world of its own together with kra001 who Davidski now says is "Proto-Uralic".



    The closest modern matches to Ymyyakhtakh are Yukaghirs and Nganasans:

    $ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}$1!=""{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5,$1}' <(grep Ymy mui/g25/ancientave) mui/g25/modern|sort -n|head -n16|awk '{printf"%.3f %s\n",$1,$2}'|sed s,^0,,
    .052 Yukagir_Tundra:Ckrd18
    .053 Yukagir_Tundra:Yuk_021
    .059 Yukagir_Tundra:Yuk_009
    .061 Yukagir_Tundra:Yuk_022
    .063 Yukagir_Tundra:Yuk_019
    .063 Nganassan:Nganassan10
    .066 Nganassan:Nganassan8
    .066 Nganassan:Nganassan2
    .067 Yukagir_Tundra:Yuk_023
    .068 Selkup:Selkup87
    .068 Nganassan:Nganassan15
    .070 Nganassan:Nganassan3
    .071 Nganassan:Nganassan1
    .072 Evenk:GS000014336
    .072 Yukagir_Tundra:Yuk_011
    .072 Evenk:GS000014335

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. NEW! G25 Maps (Euclidean and Correlation versions)
    By Lucas in forum Autosomal DNA
    Replies: 27
    Last Post: 08-01-2020, 09:10 PM
  2. IBS / IBS SC Asian distances
    By Zoro in forum Genetics
    Replies: 93
    Last Post: 07-10-2020, 08:56 PM
  3. What are good distances to get on G25?
    By FilhoV in forum Autosomal DNA
    Replies: 0
    Last Post: 03-15-2020, 03:45 PM
  4. Replies: 11
    Last Post: 08-24-2019, 06:27 AM
  5. Euclidean distances from NW Europeans, from K36
    By Creoda in forum Autosomal DNA
    Replies: 0
    Last Post: 11-23-2018, 11:47 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •