Page 2 of 6 FirstFirst 123456 LastLast
Results 11 to 20 of 52

Thread: Visualizing an ADMIXTURE run as a polygonal diagram

  1. #11
    Banned Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Jan 2020
    Last Online
    @
    Meta-Ethnicity
    SW European
    Ethnicity
    Indigenous
    Country
    Spain
    Region
    Aboriginal
    Y-DNA
    R1a
    mtDNA
    H1
    Hero
    Sinuhé
    Gender
    Posts
    20,904
    Thumbs Up
    Received: 25,627
    Given: 21,630

    2 Not allowed!

    Default

    Something like Dirichlet-type distribution?

    I found a very good image to understand PCAs in multivariate statistics in a simple and intuitive way, but I can't find it right now.

  2. #12
    Veteran Member
    Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Jul 2014
    Last Online
    04-15-2024 @ 05:51 PM
    Location
    Pole position
    Ethnicity
    Polish
    Country
    Poland
    Y-DNA
    R1b
    mtDNA
    W6a
    Gender
    Posts
    21,462
    Thumbs Up
    Received: 20,923
    Given: 18,998

    1 Not allowed!

    Default

    Very neat-looking! What software did you use to create these diagrams? R ???

  3. #13
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    0 Not allowed!

    Default

    Quote Originally Posted by vbnetkhio View Post
    I tried something like this recently:

    Code:
    a <- read.table("results.csv", header = TRUE, row.names=1)
    b <- read.table("fst_distances.csv", header = TRUE, row.names=1)
    
    a <- as.matrix(a)
    b <- as.matrix(b)
    
    c <- a %*% b
    
    write.table(c, file = "fst_scaled.txt")
    i didn't like the result. Basically all Europeans end up more similar to each other, and some Hungarians with a tiny bit of Asian were bigger outliers.
    That's exactly what I thought about doing, but I didn't think it would be that simple. But it actually worked. I made a new version of the Dodecad k12b graph in my previous post where I multiplied the matrix of admixture percentages by the FST matrix. It reduced the number of clusters in Europe and Caucasus, because the North_European, Atlantic_Med, and Caucasus components have low FST distances with each other. But it expectedly increased the number of clusters in Africa. Previously the three closest neighbors of Selkups were Kets, Dolgans, and Yukaghirs, because they all have a high proportion of the Siberian component, which is an Nganasan-like central-north Siberian component. Selkups were relatively far from Siberian populations with low Siberian and high Southeast_Asian, like Altaians. After multiplying by the FST matrix, Selkups became closer to southern Siberians like Altaians.

    My script also uses matrix multiplication to calculate the coordinates inside the polygon. For example these are the corners of an equilateral triangle centered in the origin with radius 1:

    Code:
    > triangle=sapply(c(sin,cos),function(x)head(x(seq(0,2,length.out=3+1)*pi),-1))
    > triangle
               [,1] [,2]
    [1,]  0.0000000  1.0
    [2,]  0.8660254 -0.5
    [3,] -0.8660254 -0.5
    These were admixture proportions in one K=3 run:

    Code:
    > admix=read.table(text="Saami.DG 0.238969 0.761021 0.000010\nMansi.DG 0.534995 0.464994 0.000010",row.names=1)
    Then the x and y coordinates inside the triangle would be these:

    Code:
    > as.matrix(admix)%*%triangle
                  [,1]       [,2]
    Saami.DG 0.6590549 -0.1415465
    Mansi.DG 0.4026880  0.3024930

  4. #14
    Veteran Member
    Join Date
    Jul 2019
    Last Online
    03-11-2024 @ 04:25 PM
    Ethnicity
    Unknown
    Country
    Antarctica
    Gender
    Posts
    3,911
    Thumbs Up
    Received: 3,471
    Given: 1,541

    1 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    That's exactly what I thought about doing, but I didn't think it would be that simple. But it actually worked. I made a new version of the Dodecad k12b graph in my previous post where I multiplied the matrix of admixture percentages by the FST matrix. It reduced the number of clusters in Europe and Caucasus, because the North_European, Atlantic_Med, and Caucasus components have low FST distances with each other. But it expectedly increased the number of clusters in Africa. Previously the three closest neighbors of Selkups were Kets, Dolgans, and Yukaghirs, because they all have a high proportion of the Siberian component, which is an Nganasan-like central-north Siberian component. Selkups were relatively far from Siberian populations with low Siberian and high Southeast_Asian, like Altaians. After multiplying by the FST matrix, Selkups became closer to southern Siberians like Altaians.

    My script also uses matrix multiplication to calculate the coordinates inside the polygon. For example these are the corners of an equilateral triangle centered in the origin with radius 1:

    Code:
    > triangle=sapply(c(sin,cos),function(x)head(x(seq(0,2,length.out=3+1)*pi),-1))
    > triangle
               [,1] [,2]
    [1,]  0.0000000  1.0
    [2,]  0.8660254 -0.5
    [3,] -0.8660254 -0.5
    These were admixture proportions in one K=3 run:

    Code:
    > admix=read.table(text="Saami.DG 0.238969 0.761021 0.000010\nMansi.DG 0.534995 0.464994 0.000010",row.names=1)
    Then the x and y coordinates inside the triangle would be these:

    Code:
    > as.matrix(admix)%*%triangle
                  [,1]       [,2]
    Saami.DG 0.6590549 -0.1415465
    Mansi.DG 0.4026880  0.3024930
    aaah so that's what you're doing.

    it's also how the "location predictors" like these work:

    https://gen3553.pagesperso-orange.fr/ADN/Europe.htm
    https://gen3553.pagesperso-orange.fr/ADN/K15.htm

    here i scaled k36 averages into real-life geographic coordinates:
    https://www.theapricity.com/forum/sh...uot-PCA-quot-)

    (north atlantic is scaled to London, Italian to Rome etc.)

  5. #15
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    1 Not allowed!

    Default

    Quote Originally Posted by Peterski View Post
    Very neat-looking! What software did you use to create these diagrams? R ???
    Yeah.

    On macOS, you can run my scripts like this:

    Code:
    brew install R
    brew install udunits # needed by ggforce
    R -e 'install.packages(c("tidyverse","ggforce","ggrepel"),repos="https://cloud.r-project.org")'
    R -e path/to/script.R
    I think a lot of Windows users just use the RStudio IDE: https://www.rstudio.com. But I hate GUIs and I use R from Emacs:


  6. #16
    Banned Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Jan 2020
    Last Online
    @
    Meta-Ethnicity
    SW European
    Ethnicity
    Indigenous
    Country
    Spain
    Region
    Aboriginal
    Y-DNA
    R1a
    mtDNA
    H1
    Hero
    Sinuhé
    Gender
    Posts
    20,904
    Thumbs Up
    Received: 25,627
    Given: 21,630

    0 Not allowed!

    Default

    This could be a good thread to post this link without being a complete Off Topic and being reunited several people with some interest in this type of "things":


  7. #17
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    1 Not allowed!

    Default

    Here's Dodecad K7b.

    When I calculated the clusters and the nearest neighbors, I now multiplied the matrix of admixture percentages with a square root of the matrix of FST distances between the admixture components. When I didn't take the square root of the FST distances, it seemed to have a too radical effect, and even Saudis were part of the same cluster with Finns.


  8. #18
    Veteran Member
    Join Date
    Jul 2019
    Last Online
    03-11-2024 @ 04:25 PM
    Ethnicity
    Unknown
    Country
    Antarctica
    Gender
    Posts
    3,911
    Thumbs Up
    Received: 3,471
    Given: 1,541

    1 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    Yeah.
    did you ever try calculating FST distances between populations with smartpca?

    i tried now and i get this in the logfile:

    "population: 0 Case 3450"

    it recognizes 0 populations for some reason

  9. #19
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    1 Not allowed!

    Default

    Quote Originally Posted by vbnetkhio View Post
    did you ever try calculating FST distances between populations with smartpca?

    i tried now and i get this in the logfile:

    "population: 0 Case 3450"

    it recognizes 0 populations for some reason
    You need to add population numbers to the sixth field of the fam file: https://www.biostars.org/p/266511/. The commands below use integers starting from 10 as group identifiers, because the numbers 1, 2, and 9 have a special meaning (1 assigns the line as a case, 2 assigns it as a control, and 9 ignores it).

    `phylipoutname: fstfilename` saves an FST matrix to a file, but in the file the FST values only have three digits after the decimal point. There's also the undocumented parameter `fsthiprecision: YES` which causes the FST values that are printed to STDOUT to be multiplied by million instead of thousand, but it doesn't affect the contents of the `phylipoutname` file.

    If an FST run includes more than 100 populations, SmartPCA exits with an error unless you include a parameter like `maxpops: 1000`.

    Code:
    x=uralic
    sed 1d v44.3_HO_public.anno|sort -t$'\t' -rnk15|awk -F\\t '!a[$3]++{print$2,$8}'|awk 'NR==FNR{a[$0];next}$2 in a' <(printf %s\\n Besermyan Enets Estonian Finnish Hungarian Karelian Mansi Mordovian Nganasan Saami.DG Selkup Udmurt Veps) ->$x.pick
    plink --allow-no-sex --bfile g/p/ho --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
    awk '!a[$2]++{i++}{print$1,i}' <(sort -k2 $x.pick)|awk 'NR==FNR{a[$1]=$2;next}{$6=a[$2]+9}1' - $x.fam>$x.famtemp;mv $x.fam{temp,}
    smartpca -p <(printf %s\\n genotypename:\ $x.bed snpname:\ $x.bim indivname:\ $x.fam fstonly:\ YES fsthiprecision:\ YES)|tee $x.smartpca
    p=$(awk 'NR==FNR{a[$1]=$2;next}{print a[$2]}' $x.{pick,fam}|awk '!a[$0]++')
    sed -n '/fst \*1000000/,/^$/p' $x.smartpca|sed 1,2d|sed \$d|tr -s ' ' ,|cut -d, -f3-|paste -d, <(printf %s\\n "$p") -|cat <(printf %s\\n '' "$p"|paste -sd,) ->$x.fst
    Maybe you're supposed to do LD pruning before calculating FST, because Kerminen et al. 2021 said this: "We calculated pairwise-FST between the reference groups (Fig 2) and the ancestor candidate groups (S9 Fig) using SmartPCA of EIGENSOFT package[7] (fstonly: YES, fsthiprecision: YES) and 56,661 LD-independent variants."
    Last edited by Komintasavalta; 05-09-2021 at 10:39 PM. Reason: Fixed code so population names are printed in right order

  10. #20
    Veteran Member
    Join Date
    Jul 2019
    Last Online
    03-11-2024 @ 04:25 PM
    Ethnicity
    Unknown
    Country
    Antarctica
    Gender
    Posts
    3,911
    Thumbs Up
    Received: 3,471
    Given: 1,541

    1 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    You need to add population numbers to the sixth field of the fam file: https://www.biostars.org/p/266511/. The commands below use integers starting from 10 as group identifiers, because the numbers 1, 2, and 9 have a special meaning (1 assigns the line as a case, 2 assigns it as a control, and 9 ignores it).

    `phylipoutname: fstfilename` saves an FST matrix to a file, but in the file the FST values only have three digits after the decimal point. There's also the undocumented parameter `fsthiprecision: YES` which causes the FST values that are printed to STDOUT to be multiplied by million instead of thousand, but it doesn't affect the contents of the `phylipoutname` file.

    If an FST run includes more than 100 populations, SmartPCA exits with an error unless you include a parameter like `maxpops: 1000`.

    So I ended up with code like this:

    Code:
    x=uralic
    printf %s\\n Besermyan Enets Estonian Finnish Hungarian Karelian Mansi Mordovian Nganasan Saami.DG Selkup Udmurt Veps>$x.pop
    sed 1d v44.3_HO_public.anno|sort -t$'\t' -rnk15|awk -F\\t '!a[$3]++{print$2,$8}'|awk 'NR==FNR{a[$0];next}$2 in a' $x.pop ->$x.pick
    plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
    awk '!a[$2]++{i++}{print$1,i}' $x.pick|awk 'NR==FNR{a[$1]=$2;next}{$6=a[$2]+9}1' - $x.fam>$x.famtemp;mv $x.fam{temp,}
    smartpca -p <(printf %s\\n genotypename:\ $x.bed snpname:\ $x.bim indivname:\ $x.fam fstonly:\ YES fsthiprecision:\ YES)|tee $x.smartpca
    p=$(cut -d' ' -f2 $x.pick|awk '!a[$0]++');sed -n '/fst \*1000000/,/^$/p' $x.smartpca|sed 1,2d|sed \$d|tr -s ' ' ,|cut -d, -f3-|paste -d, <(echo "$p") -|cat <(printf %s\\n '' "$p"|paste -sd,) ->$x.fst
    Maybe you're supposed to do LD pruning before calculating FST, because Kerminen et al. 2021 said this: "We calculated pairwise-FST between the reference groups (Fig 2) and the ancestor candidate groups (S9 Fig) using SmartPCA of EIGENSOFT package[7] (fstonly: YES, fsthiprecision: YES) and 56,661 LD-independent variants."
    would there be any problems with calculating fst this way:

    run supervised admixture, assign each sample to it's population, and run with as many K as there is populations, and fst gets written to the output.

    http://dalexander.github.io/admixtur...ure-manual.pdf

    smartpca's version takes forever, this could actually be faster? because there are no unassigned samples, just the allele frequencies and fst will be calculated

Page 2 of 6 FirstFirst 123456 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 10
    Last Post: 01-04-2020, 06:29 PM
  2. Replies: 0
    Last Post: 05-22-2018, 03:47 PM
  3. IQ Correlation With Skin-Tone Diagram
    By Anglojew in forum Race and Society
    Replies: 32
    Last Post: 08-26-2017, 07:34 PM
  4. Visualizing the major causes of death in the 20th Century.
    By microrobert in forum Health and Lifestyle
    Replies: 0
    Last Post: 03-13-2013, 07:59 PM
  5. Visualizing the BP Oil Disaster
    By Grumpy Cat in forum Animals and Nature
    Replies: 15
    Last Post: 03-20-2011, 11:09 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •