Page 4 of 6 FirstFirst 123456 LastLast
Results 31 to 40 of 52

Thread: Visualizing an ADMIXTURE run as a polygonal diagram

  1. #31
    Veteran Member Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Oct 2016
    Last Online
    @
    Ethnicity
    me
    Country
    European Union
    Y-DNA
    R1a > YP1337 > R-BY160486*
    mtDNA
    H3*
    Gender
    Posts
    6,066
    Thumbs Up
    Received: 7,243
    Given: 2,623

    0 Not allowed!

    Default

    You should make Github page with all those scripts. And then it could be officialy used by others in their papers.

  2. #32
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,863
    Given: 2,946

    3 Not allowed!

    Default

    If you do a non-SSA ADMIXTURE run with two components, you can estimate the amount of East Eurasian ancestry in different populations:

    Code:
    $ sort -rnk2 maalima2.i.2a|awk '{printf"%.1f %s\n",100*$2,$1}'
    100.0 Zhuang
    100.0 Tujia
    100.0 Tibetan_Yunnan
    100.0 She
    100.0 Qiang
    100.0 Nivh
    100.0 Negidal
    100.0 Naxi
    100.0 Nanai
    100.0 Mulam
    100.0 Miao
    100.0 Maonan
    100.0 Li
    100.0 Korean
    100.0 Gelao
    100.0 Dong
    100.0 Ami
    100.0 Atayal
    99.9 Dai
    99.9 Han
    99.9 Japanese
    99.8 Vietnamese
    99.8 Yi
    99.8 China_Lahu
    99.8 Ulchi
    99.7 Kankanaey
    99.7 Murut
    99.4 Kinh
    99.2 Hezhen
    98.7 Ilocano
    98.4 Oroqen
    98.3 Sherpa
    98.2 Xibo
    98.1 Dusun
    97.7 Daur
    97.6 Tibetan
    97.4 Yugur
    97.0 Mongola
    96.5 Rai
    95.5 Yukagir_Tundra
    94.9 Nganasan
    94.8 Bonan
    94.6 Koryak
    94.5 Visayan
    94.5 Tu
    94.0 Evenk_Transbaikal
    93.8 Itelmen
    93.6 Gurung
    93.2 Chukchi
    93.2 Tagalog
    92.0 Chukchi1
    91.4 Eskimo_ChaplinSireniki
    90.9 Eskimo_Naukan
    90.3 Salar
    89.7 Cambodian
    89.7 Khamnegan
    89.6 Thai
    89.4 Dungan
    88.6 Dongxiang
    88.6 Magar
    88.6 Malay
    88.5 Yakut
    87.3 Todzin
    87.1 Buryat
    86.9 Tamang
    86.8 Burmese
    86.2 Mongol
    83.7 Dolgan
    83.5 Tofalar
    83.4 Evenk_FarEast
    83.2 Tuvinian
    83.2 Karitiana
    82.0 Kalmyk
    81.2 Piapoco
    81.2 Mixe
    79.7 Surui
    78.5 Pima
    78.5 Kusunda
    77.2 Zapotec
    76.2 Mixtec
    76.0 Enets
    75.7 Kazakh_China
    74.7 Khakass_Kachin
    74.2 Altaian
    72.7 Bolivian
    71.9 Mayan
    71.6 Kyrgyz_China
    70.8 Quechua
    70.8 Nasioi
    69.2 Kyrgyz_Kyrgyzstan
    68.7 Tharu
    68.7 Ket
    68.7 Even
    68.0 Kyrgyz_Tajikistan
    67.9 Papuan
    67.7 Khakass
    66.1 Newar
    65.3 Selkup
    63.7 Kazakh
    63.3 Shor_Khakassia
    62.6 Shor_Mountain
    62.6 Tubalar
    62.6 Australian
    58.1 Altaian_Chelkan
    56.5 Karakalpak
    55.1 Hazara
    54.4 Uyghur
    54.0 Nogai_Astrakhan
    52.7 Mansi
    51.7 Tatar_Siberian_Zabolotniye
    48.7 Nogai_Stavropol
    47.8 Tatar_Siberian
    46.9 Yukagir_Forest
    46.6 Tlingit
    42.2 Bahun
    42.0 Bengali
    39.1 Uzbek
    36.9 Aleut
    35.5 Bashkir
    35.4 Turkmen
    33.2 Punjabi
    32.0 GujaratiD
    30.1 GujaratiC
    29.1 Burusho
    28.3 Udmurt
    27.1 GujaratiB
    26.3 Nogai_Karachay_Cherkessia
    24.9 Besermyan
    23.5 GujaratiA
    23.4 Jew_Cochin
    23.4 Chuvash
    22.1 Sindhi_Pakistan
    21.3 Tatar_Kazan
    19.7 Tajik
    19.6 Pathan
    18.5 Kalash
    15.6 Tatar_Mishar
    15.5 Russian_Archangelsk_Leshukonsky
    14.3 Balochi
    13.9 Turkish_Balikesir
    13.6 Brahui
    13.1 Russian_Archangelsk_Pinezhsky
    11.0 Makrani
    10.6 Abazin
    10.4 Kabardinian
    10.4 Veps
    9.6 Russian_Archangelsk_Krasnoborsky
    9.5 Karachai
    9.3 Balkar
    9.1 Karelian
    8.7 Circassian
    8.4 Azeri
    8.3 Mordovian
    8.0 Finnish
    8.0 Ossetian
    7.1 Kumyk
    7.1 Ezid
    6.4 Turkish
    6.1 Adygei
    6.0 Ingushian
    5.9 Iranian
    5.8 Russian
    5.1 Lak
    4.9 Avar
    4.9 Tabasaran
    4.8 Chechen
    4.6 Lezgin
    4.6 Darginian
    4.5 Kaitag
    4.3 Kubachinian
    3.3 Kurd
    2.9 Estonian
    2.6 Abkhasian
    2.5 Belarusian
    2.1 Gagauz
    1.6 Ukrainian
    1.6 Ukrainian_North
    1.6 Lithuanian
    1.3 Hungarian
    1.1 Lebanese
    1.1 Georgian
    1.0 Jew_Iranian
    1.0 Moldavian
    0.9 Czech
    0.8 Jew_Georgian
    0.7 Norwegian
    0.7 Syrian
    0.7 Jew_Ashkenazi
    0.7 Yemeni_Desert
    0.6 Jordanian
    0.6 Assyrian
    0.6 Lebanese_Muslim
    0.6 Bulgarian
    0.5 Armenian
    0.5 Armenian_Hemsheni
    0.4 Croatian
    0.4 Saudi
    0.3 Yemeni_Northwest
    0.3 BedouinA
    0.3 Yemeni_Highlands
    0.3 French
    0.3 Egyptian
    0.3 Romanian
    0.3 English
    0.2 Icelandic
    0.2 Lebanese_Christian
    0.2 Maltese
    0.2 Scottish
    0.2 Palestinian
    0.2 Greek
    0.2 Orcadian
    0.1 Italian_North
    0.1 Italian_South
    0.1 Druze
    0.1 Jew_Turkish
    0.1 Albanian
    0.1 Spanish
    0.0 Jew_Iraqi
    0.0 Jew_Moroccan
    0.0 Basque
    0.0 Jew_Yemenite
    0.0 Spanish_North
    0.0 Sicilian
    0.0 Sardinian
    0.0 Jew_Tunisian
    0.0 Jew_Libyan
    0.0 Cypriot
    0.0 Canary_Islander
    0.0 BedouinB
    I first did a global K=3 run of modern samples, where I selected samples where the years BP field in the anno file was 0:

    Code:
    curl -LsO reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.tar;tar -xf v44.3_HO_public.tar
    f=v44.3_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    igno()(grep -Ev '\.REF|rel\.|fail\.|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutang|Primate_Chimp|hg19ref')
    x=maalima;awk -F\\t 'NR>1{print$2,$8}' v44.3_HO_public.anno|igno|grep -Ev '\.(SG|SDG|DG|WGA)'|grep -v _o|cut -d' ' -f1|awk -F\\t 'NR==FNR{a[$0];next}$2 in a&&$6==0&&(!a[$3]++){print$2,$8}' - v44.3_HO_public.anno>$x.pick
    plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
    plink --allow-no-sex --bfile $x --genome --out $x
    awk 'FNR>1&&$10>=.3{print$2<$4?$2:$4}' $x.genome|awk 'NR==FNR{a[$0];next}!($1 in a)' - $x.pick>$x.i.pick
    plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.i.pick v44.3_HO_public.fam) --make-bed --out $x.i
    plink --allow-no-sex --bfile $x.i --indep-pairwise 50 10 .01 --out $x.i
    plink --bfile $x.i --extract $x.prune.in --make-bed --out $x.i.p
    tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1,i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i,j]/n[i]);print o}}' "FS=${1-$'\t'}")
    k=3;admixture -j4 -C .1 $x.i.p.bed $k;paste -d' ' <(awk 'NR==FNR{a[$1]=$2;next}{print$2,a[$2]}' $x.i.pick $x.i.p.fam) $x.i.p.$k.Q>$x.$k;cut -d' ' -f2- $x.$k|tav \ >$x.$k.ave
    Then I selected samples that had less than 20% of the SSA component (excluding Australians and Papuans) and I did a new K=2 run.

    Next I did a K=3 run of the non-SSA samples. Even though I excluded some North African populations with the highest SSA ancestry, the third component still became an SSA-like or basal-like component, where both Egyptians and Papuans have about 50% of the third component. Maltese have 22% of the third component and Greeks have 9%. Even Thais have 6% of the third component.



    At K=4, Americans and Siberians split off from East-Southeast Asians. Kets have 70% of the American-Siberian component because they have so much ANE.

    The clustering and nearest neighbors are based on just the runs at K=2, K=3, and K=4, because the K=5 run has already taken more than an hour. Maybe I should've done more aggressive LD pruning, because almost 100,000 SNPs remained even after `--indep-pairwise 50 10 .03`.



    Sorry for all these huge images, but regular-size images look like crap on a retina display.

    Quote Originally Posted by Lucas View Post
    You should make Github page with all those scripts. And then it could be officialy used by others in their papers.
    I already deleted my websites and my Github account years ago, and I decided that I was no longer going to make any contributions to the world.

    Github is too gay and post web 2.0 anyway. Oldschool static websites are nicer.

    Also I don't think they would like to use a script that says `set.seed(1488)`.
    Last edited by Komintasavalta; 05-10-2021 at 10:07 PM.

  3. #33
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,863
    Given: 2,946

    2 Not allowed!

    Default

    I had to leave it running overnight, but the runs at K=6, K=7, and K=8 now finished.

    At K=7, I got a component that is similar to the Gedrosia component in Dodecad K7b. It is maximal in Kalash, Brahui, Sindhi_Pakistan, and Balochi. In the official K7b spreadsheet, the Gedrosia component is maximal Brahui, Balochi, Makrani, and Sindhi.

    The European component is the highest in Lithuanians (96%) but it's the fifth highest in Spanish_North (94%) and the eighth highest in Basques (93%). In ADMIXTURE models where Southwestern Europeans have a high proportion of a European component, usually Uralic people have fairly high Mongoloid ancestry, and here also the proportion of the Nganasan component is 10% in Finns, 13% in Vepsians, and 30% in Udmurts.

    Based on the links to the three nearest neighbors, there is a path from Finns to Mongols: first from Finnish to Veps, then to Tatar_Mishar, Tatar_Kazan, Chuvash, Udmurt, Aleut, Tlingit, Mansi, Altaian_Chelkan, Tubalar, Khakass, Altaian, Evenk_FarEast, Kalmyk, and then to Mongol. I didn't realize it until recently, but there is actually a huge genetic gap produced by the Gobi Desert, where Khalkha Mongols have a high genetic distance to Han and northern Chinese ethnicities. It is also visible in this image, where there is no line that connects Mongols to Hans, apart from lines that go through South Asians or Australians. However my method for calculating the nearest neighbors could still be improved, because now one of the three closest neighbors of Australians are Karakalpaks.


  4. #34
    Veteran Member Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Jun 2014
    Last Online
    03-13-2024 @ 06:31 PM
    Location
    Helsinki
    Ethnicity
    Finnish
    Country
    Finland
    Y-DNA
    I1
    mtDNA
    H39
    Politics
    Ugly history as it is. Don't blame me.
    Gender
    Posts
    4,729
    Thumbs Up
    Received: 3,437
    Given: 1,436

    2 Not allowed!

    Default

    Structure makes this triangle straight from the genome data.

  5. #35
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    2 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    If you do a non-SSA ADMIXTURE run with two components, you can estimate the amount of East Eurasian ancestry in different populations:

    Code:
    $ sort -rnk2 maalima2.i.2a|awk '{printf"%.1f %s\n",100*$2,$1}'
    100.0 Zhuang
    100.0 Tujia
    100.0 Tibetan_Yunnan
    100.0 She
    100.0 Qiang
    100.0 Nivh
    100.0 Negidal
    100.0 Naxi
    100.0 Nanai
    100.0 Mulam
    100.0 Miao
    100.0 Maonan
    100.0 Li
    100.0 Korean
    100.0 Gelao
    100.0 Dong
    100.0 Ami
    100.0 Atayal
    99.9 Dai
    99.9 Han
    99.9 Japanese
    99.8 Vietnamese
    99.8 Yi
    99.8 China_Lahu
    99.8 Ulchi
    99.7 Kankanaey
    99.7 Murut
    99.4 Kinh
    99.2 Hezhen
    98.7 Ilocano
    98.4 Oroqen
    98.3 Sherpa
    98.2 Xibo
    98.1 Dusun
    97.7 Daur
    97.6 Tibetan
    97.4 Yugur
    97.0 Mongola
    96.5 Rai
    95.5 Yukagir_Tundra
    94.9 Nganasan
    94.8 Bonan
    94.6 Koryak
    94.5 Visayan
    94.5 Tu
    94.0 Evenk_Transbaikal
    93.8 Itelmen
    93.6 Gurung
    93.2 Chukchi
    93.2 Tagalog
    92.0 Chukchi1
    91.4 Eskimo_ChaplinSireniki
    90.9 Eskimo_Naukan
    90.3 Salar
    89.7 Cambodian
    89.7 Khamnegan
    89.6 Thai
    89.4 Dungan
    88.6 Dongxiang
    88.6 Magar
    88.6 Malay
    88.5 Yakut
    87.3 Todzin
    87.1 Buryat
    86.9 Tamang
    86.8 Burmese
    86.2 Mongol
    83.7 Dolgan
    83.5 Tofalar
    83.4 Evenk_FarEast
    83.2 Tuvinian
    83.2 Karitiana
    82.0 Kalmyk
    81.2 Piapoco
    81.2 Mixe
    79.7 Surui
    78.5 Pima
    78.5 Kusunda
    77.2 Zapotec
    76.2 Mixtec
    76.0 Enets
    75.7 Kazakh_China
    74.7 Khakass_Kachin
    74.2 Altaian
    72.7 Bolivian
    71.9 Mayan
    71.6 Kyrgyz_China
    70.8 Quechua
    70.8 Nasioi
    69.2 Kyrgyz_Kyrgyzstan
    68.7 Tharu
    68.7 Ket
    68.7 Even
    68.0 Kyrgyz_Tajikistan
    67.9 Papuan
    67.7 Khakass
    66.1 Newar
    65.3 Selkup
    63.7 Kazakh
    63.3 Shor_Khakassia
    62.6 Shor_Mountain
    62.6 Tubalar
    62.6 Australian
    58.1 Altaian_Chelkan
    56.5 Karakalpak
    55.1 Hazara
    54.4 Uyghur
    54.0 Nogai_Astrakhan
    52.7 Mansi
    51.7 Tatar_Siberian_Zabolotniye
    48.7 Nogai_Stavropol
    47.8 Tatar_Siberian
    46.9 Yukagir_Forest
    46.6 Tlingit
    42.2 Bahun
    42.0 Bengali
    39.1 Uzbek
    36.9 Aleut
    35.5 Bashkir
    35.4 Turkmen
    33.2 Punjabi
    32.0 GujaratiD
    30.1 GujaratiC
    29.1 Burusho
    28.3 Udmurt
    27.1 GujaratiB
    26.3 Nogai_Karachay_Cherkessia
    24.9 Besermyan
    23.5 GujaratiA
    23.4 Jew_Cochin
    23.4 Chuvash
    22.1 Sindhi_Pakistan
    21.3 Tatar_Kazan
    19.7 Tajik
    19.6 Pathan
    18.5 Kalash
    15.6 Tatar_Mishar
    15.5 Russian_Archangelsk_Leshukonsky
    14.3 Balochi
    13.9 Turkish_Balikesir
    13.6 Brahui
    13.1 Russian_Archangelsk_Pinezhsky
    11.0 Makrani
    10.6 Abazin
    10.4 Kabardinian
    10.4 Veps
    9.6 Russian_Archangelsk_Krasnoborsky
    9.5 Karachai
    9.3 Balkar
    9.1 Karelian
    8.7 Circassian
    8.4 Azeri
    8.3 Mordovian
    8.0 Finnish
    8.0 Ossetian
    7.1 Kumyk
    7.1 Ezid
    6.4 Turkish
    6.1 Adygei
    6.0 Ingushian
    5.9 Iranian
    5.8 Russian
    5.1 Lak
    4.9 Avar
    4.9 Tabasaran
    4.8 Chechen
    4.6 Lezgin
    4.6 Darginian
    4.5 Kaitag
    4.3 Kubachinian
    3.3 Kurd
    2.9 Estonian
    2.6 Abkhasian
    2.5 Belarusian
    2.1 Gagauz
    1.6 Ukrainian
    1.6 Ukrainian_North
    1.6 Lithuanian
    1.3 Hungarian
    1.1 Lebanese
    1.1 Georgian
    1.0 Jew_Iranian
    1.0 Moldavian
    0.9 Czech
    0.8 Jew_Georgian
    0.7 Norwegian
    0.7 Syrian
    0.7 Jew_Ashkenazi
    0.7 Yemeni_Desert
    0.6 Jordanian
    0.6 Assyrian
    0.6 Lebanese_Muslim
    0.6 Bulgarian
    0.5 Armenian
    0.5 Armenian_Hemsheni
    0.4 Croatian
    0.4 Saudi
    0.3 Yemeni_Northwest
    0.3 BedouinA
    0.3 Yemeni_Highlands
    0.3 French
    0.3 Egyptian
    0.3 Romanian
    0.3 English
    0.2 Icelandic
    0.2 Lebanese_Christian
    0.2 Maltese
    0.2 Scottish
    0.2 Palestinian
    0.2 Greek
    0.2 Orcadian
    0.1 Italian_North
    0.1 Italian_South
    0.1 Druze
    0.1 Jew_Turkish
    0.1 Albanian
    0.1 Spanish
    0.0 Jew_Iraqi
    0.0 Jew_Moroccan
    0.0 Basque
    0.0 Jew_Yemenite
    0.0 Spanish_North
    0.0 Sicilian
    0.0 Sardinian
    0.0 Jew_Tunisian
    0.0 Jew_Libyan
    0.0 Cypriot
    0.0 Canary_Islander
    0.0 BedouinB
    I first did a global K=3 run of modern samples, where I selected samples where the years BP field in the anno file was 0:

    Code:
    curl -LsO reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.tar;tar -xf v44.3_HO_public.tar
    f=v44.3_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    igno()(grep -Ev '\.REF|rel\.|fail\.|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutang|Primate_Chimp|hg19ref')
    x=maalima;awk -F\\t 'NR>1{print$2,$8}' v44.3_HO_public.anno|igno|grep -Ev '\.(SG|SDG|DG|WGA)'|grep -v _o|cut -d' ' -f1|awk -F\\t 'NR==FNR{a[$0];next}$2 in a&&$6==0&&(!a[$3]++){print$2,$8}' - v44.3_HO_public.anno>$x.pick
    plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
    plink --allow-no-sex --bfile $x --genome --out $x
    awk 'FNR>1&&$10>=.3{print$2<$4?$2:$4}' $x.genome|awk 'NR==FNR{a[$0];next}!($1 in a)' - $x.pick>$x.i.pick
    plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.i.pick v44.3_HO_public.fam) --make-bed --out $x.i
    plink --allow-no-sex --bfile $x.i --indep-pairwise 50 10 .01 --out $x.i
    plink --bfile $x.i --extract $x.prune.in --make-bed --out $x.i.p
    tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1,i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i,j]/n[i]);print o}}' "FS=${1-$'\t'}")
    k=3;admixture -j4 -C .1 $x.i.p.bed $k;paste -d' ' <(awk 'NR==FNR{a[$1]=$2;next}{print$2,a[$2]}' $x.i.pick $x.i.p.fam) $x.i.p.$k.Q>$x.$k;cut -d' ' -f2- $x.$k|tav \ >$x.$k.ave
    Then I selected samples that had less than 20% of the SSA component (excluding Australians and Papuans) and I did a new K=2 run.

    Next I did a K=3 run of the non-SSA samples. Even though I excluded some North African populations with the highest SSA ancestry, the third component still became an SSA-like or basal-like component, where both Egyptians and Papuans have about 50% of the third component. Maltese have 22% of the third component and Greeks have 9%. Even Thais have 6% of the third component.



    At K=4, Americans and Siberians split off from East-Southeast Asians. Kets have 70% of the American-Siberian component because they have so much ANE.

    The clustering and nearest neighbors are based on just the runs at K=2, K=3, and K=4, because the K=5 run has already taken more than an hour. Maybe I should've done more aggressive LD pruning, because almost 100,000 SNPs remained even after `--indep-pairwise 50 10 .03`.



    Sorry for all these huge images, but regular-size images look like crap on a retina display.



    I already deleted my websites and my Github account years ago, and I decided that I was no longer going to make any contributions to the world.

    Github is too gay and post web 2.0 anyway. Oldschool static websites are nicer.

    Also I don't think they would like to use a script that says `set.seed(1488)`.
    Good job with graphing! I like how you continuously try to find new ways to visualize your results. Keep on doing what you do.

    Here’s a few notes on your K3 unsupervised admixture run:

    - Clustering should not necessarily be interpreted as shared genetic drift in the last 30,000 years, or geneflow. The classic example is Neanderthal clustering with SSA in Admixture. Therefore SNP ascertainment substantially skews results.

    - You’ll notice problems with SNP ascertainment in Reich dataset with Turkmen, Tatar, Altaians and a couple of others not clustering as expected (maybe that’s why you didn’t include them?)

    - Papuans having 50% of the oramge component along with Egyptians and Bedouin should not be interpreted as orange being Basal Eurasian because Papuans should not be scoring that much Basal Eurasian. Orange component maybe nothing more than clustering due to very ancient million year old alleles or some other non meaningful SNP artifacts

    - Ezidi Kurds having higher E. Eurasian than Kurmanji or Sorani Kurds would not make sense in my experience since they don’t have as much Central Asian input as other Kurds

    If you want to make a K3 based on East and West Eurasian my suggestion is supervised using good LBK samples as West Eurasian proxies
    Last edited by Zoro; 05-11-2021 at 10:47 AM.
    Muzh ba staso la tyaro tsakha ra wubaasu

    [IMG][/IMG]

  6. #36
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,863
    Given: 2,946

    2 Not allowed!

    Default

    I now figured out how to use the `circlize` package to draw a circular stacked bar chart: https://jokergoo.github.io/circlize_book/book/. Next I'll try to learn how to add a thin bar for each individual sample within a population.



    Code:
    library(circlize)
    library(vegan) # for reorder.hclust (may be masked by the package seriation)
    library(dendextend) # for color_branches
    
    f="uralaltaic.i"
    kvals=c(3,7)
    columnorder=list(c(3,2,1),c(1,5,4,3,6,7,2))
    
    mats=sapply(kvals,function(x)read.table(paste0(f,".",x,"a"),r=1)[,columnorder[lapply(columnorder,length)==x][[1]]])
    
    joined=do.call(cbind,sapply(Sys.glob(paste0(f,".[0-9]a")),function(x)read.table(x,r=1)))
    dist=as.data.frame(as.matrix(dist(joined)))
    hc=hclust(dist(joined))
    
    hc=reorder(hc,dist[,"Nganasan"]-dist[,"Estonian"])
    # hc=reorder(hc,mats[[1]][,3]-mats[[1]][,1])
    # maxdist=which(dist==max(dist))[1];hc=reorder(hc,dist[,maxdist%%nrow(dist)]-dist[,maxdist%/%nrow(dist)+1])
    
    labelcolor=hcl(c(260,120,60,0,220,160,310,90)+15,60,70)
    barcolor=list(hcl(c(220,120,310)+15,60,70),hcl(c(220,60,120,0,270,90,310)+15,60,70))
    
    labels=hc$labels[hc$order]
    cut=cutree(hc,8)
    dend=color_branches(as.dendrogram(hc),k=length(unique(cut)),col=labelcolor[unique(cut[labels])])
    
    circos.clear()
    png("a.png",w=2500,h=2500,res=300)
    circos.par(cell.padding=c(0,0,0,0))
    circos.initialize(0,xlim=c(0,nrow(mats[[1]])))
    
    circos.track(ylim=c(0,1),bg.border=NA,track.height=.2,track.margin=c(.005,0),panel.fun=function(x,y)
      for(i in 1:nrow(mats[[1]]))circos.text(i-.5,0,labels[i],adj=c(0,.5),facing="clockwise",niceFacing=T,cex=.65,col=labelcolor[cut[labels[i]]])
    )
    
    for(j in length(mats):1)circos.track(ylim=c(0,1),track.height=.25,track.margin=c(0,.01),bg.lty=0,panel.fun=function(x,y){
      mat=as.matrix(mats[[j]][hc$order,])
      pos=1:nrow(mat)-.5
      barwidth=1
      for(i in 1:ncol(mat)){
        seq1=rowSums(mat[,seq(i-1),drop=F])
        seq2=rowSums(mat[,seq(i),drop=F])
        circos.rect(pos-barwidth/2,if(i==1){0}else{seq1},pos+barwidth/2,seq2,col=barcolor[[j]][i],border="gray20",lwd=.1)
      }
      for(i in 1:ncol(mat)){
        seq1=rowSums(mat[,seq(i-1),drop=F])
        seq2=rowSums(mat[,seq(i),drop=F])
        lab=round(100*mat[,i])
        lab[lab<=1]=""
        circos.text(pos,if(i==1){seq1/2}else{seq1+(seq2-seq1)/2},labels=lab,col="gray10",cex=.5,facing="downward")
      }
    })
    
    circos.track(ylim=c(0,attr(dend,"height")),track.height=.25,track.margin=c(0,.0015),bg.border=NA,panel.fun=function(x,y)circos.dendrogram(dend))
    
    circos.clear()
    dev.off()
    Quote Originally Posted by Lemminkäinen View Post
    Structure makes this triangle straight from the genome data.
    I guess you mean this (from the Structure manual, https://web.stanford.edu/group/pritc...ucture_doc.pdf):



    I thought that this was something that was invented by me...
    Last edited by Komintasavalta; 05-12-2021 at 07:09 PM.

  7. #37
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,863
    Given: 2,946

    1 Not allowed!

    Default

    Quote Originally Posted by Zoro View Post
    If you want to make a K3 based on East and West Eurasian my suggestion is supervised using good LBK samples as West Eurasian proxies
    I haven't been very successful in making supervised runs where I have only used ancient samples as references. But if I do an unsupervised run with the right mixture of modern and ancient samples, I can get components for WHG and LBK at a relatively low K value.

    The image below shows two ADMIXTURE runs at K=3 and K=6, where I included modern samples with the suffix `.DG`, and I included ancient samples with over 500,000 SNPs and with mean age BP over 6,000. I used `--indep-pairwise 50 10 .05` which kept 72,163 SNPs.

    Now I don't get that much SSA in Eurasians even at K=3, but maybe it's partially because the SSA component includes many Capoid-Bambutids, so even West Africans only get 90-95% of the SSA component at K=3. In Gedrosia K3, even Somalis can get 3% East Eurasian ancestry, but maybe it's for similar reasons that even West Africans get East Eurasian ancestry in these runs.

    However there's also something weird about how Villabruna gets 10% SSA at K=3.

    At K=3, Norwegians get 8% of the East Eurasian component, but at K=6, Norwegians get 9% of the American component and 0% of the East Eurasian component. At K=6, Finns get 5% East Asian in addition to 10% American. At K=6, Karelia_HG gets 25% American, 42% WHG, and 32% LBK.



    Here's a SmartPCA run of the same samples without SSAs, Saharans, or Australo-Melanesians. This time the clustering and lines to nearest neighbors are not based on an FST matrix, but they're just based on the first 8 dimensions of the PCA multiplied by the square roots of the eigenvalues. When I include ancient populations in an FST run, there's usually a huge distance from some ancient populations to other populations. I don't know if it's because of missing data or something.



    BTW where can we see the proxies that were used in Gedrosia K3? Was the West Eurasian component based on LBK or something?

  8. #38
    Veteran Member
    Join Date
    Aug 2014
    Last Online
    Today @ 01:53 PM
    Location
    Côte d'Azur
    Ethnicity
    Solutrean
    Country
    Monaco
    Region
    Lyon
    Y-DNA
    R1b-Z367
    mtDNA
    H1c1
    Gender
    Posts
    7,378
    Thumbs Up
    Received: 9,472
    Given: 5,729

    0 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post

    BTW where can we see the proxies that were used in Gedrosia K3? Was the West Eurasian component based on LBK or something?
    Yes. It's a quote from gedwiki (link seems dead right now)

    Eurasia K3 - E Eurasian, W Eurasian, and Sub-Saharan African Calculator
    This calculator calculates an individual's E Eurasian, W Eurasian, and Sub-Saharan African admixture.

    The components are defined as follows:

    1- E Eurasian - This component peaks in E & SE populations such as Ami, Nivkh, Dai, Han, and Ulchi, at about 100%, followed by Siberian & other Asian populations such as Nganasans, Tibetans, Subba, and Mongola.

    2- W Eurasian - This component peaks in Neolithic European farmers such as Stuttgart, and LBK culture, as well as in most modern European populations at over 95%.

    3- SSA (Sub-Saharan African) - This component peaks in Sub-Saharan African populations such as Yoruban, Esan, and Luhiya at over 97%.

  9. #39
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,863
    Given: 2,946

    0 Not allowed!

    Default

    New VUR vs Moor K2 calculator:

    $ x=vurmoor
    $ printf %s\\n Albanian Basque Basque.SDG Belarusian Besermyan Bulgarian Chuvash Cretan.DG Croatian Czech English Estonian Finnish Finnish.DG French French.SDG Gagauz Greek Hungarian Icelandic Italian_North Italian_South Karelian Lithuanian Maltese Mari.SDG Moldavian Mordovian Norwegian Norwegian.DG Orcadian Orcadian.SDG Polish.DG Romanian Russian Russian.SDG Russian_Archangelsk_Krasnoborsky Russian_Archangelsk_Leshukonsky Russian_Archangelsk_Pinezhsky Saami.DG Sardinian Scottish Sicilian Spanish Spanish_North Tatar_Kazan Tatar_Mishar Udmurt Ukrainian Ukrainian_North Veps>$x.pop
    $ awk -F\\t 'NR==FNR{a[$0];next}$8 in a&&!a[$3]++{print$3,$8}' $x.pop v44.3_HO_public.anno|awk '++a[$2]<=16'>$x.pick
    $ plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
    [...]
    $ plink --allow-no-sex --bfile $x --indep-pairwise 50 10 .05 --out $x;plink --allow-no-sex --bfile $x --extract $x.prune.in --make-bed --out $x.p
    [...]
    $ admixture -j4 -C .1 $x.p.bed 2
    [...]
    $ tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1][i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i][j]/n[i]);print o}}' "FS=${1-$'\t'}")
    $ awk 'NR==FNR{a[$1]=$2;next}{print a[$2]}' $x.{pick,fam}|paste -d' ' - $x.p.2.Q|sed -E 's/\.S?DG//'|tav ' '|sort -rnk2|awk '{for(i=2;i<=NF;i++)printf"%.0f ",100*$i;print$1}'
    100 0 Udmurt
    100 0 Chuvash
    100 0 Besermyan
    100 0 Russian_Archangelsk_Pinezhsky
    100 0 Russian_Archangelsk_Leshukonsky
    99 1 Veps
    96 4 Tatar_Kazan
    93 7 Karelian
    86 14 Tatar_Mishar
    83 17 Russian_Archangelsk_Krasnoborsky
    81 19 Finnish
    74 26 Mordovian
    68 32 Estonian
    66 34 Russian
    64 36 Lithuanian
    61 39 Ukrainian_North
    57 43 Belarusian
    54 46 Ukrainian
    37 63 Czech
    32 68 Hungarian
    32 68 Icelandic
    30 70 Gagauz
    28 72 Norwegian
    26 74 Scottish
    26 74 Moldavian
    23 77 Orcadian
    22 78 Croatian
    22 78 English
    17 83 Bulgarian
    16 84 Romanian
    13 87 French
    4 96 Albanian
    4 96 Greek
    0 100 Italian_North
    0 100 Sicilian
    0 100 Sardinian
    0 100 Maltese
    0 100 Italian_South
    0 100 Basque
    0 100 Spanish_North
    0 100 Spanish

    It's cool how Finns are 81% VUR but Norwegians are 72% Moor.
    Last edited by Komintasavalta; 05-29-2021 at 09:18 AM.

  10. #40
    Veteran Member
    Join Date
    Dec 2020
    Last Online
    Today @ 09:11 AM
    Meta-Ethnicity
    French Alpinoide
    Ethnicity
    European
    Country
    France
    Taxonomy
    French Alpinoide
    Gender
    Posts
    1,617
    Thumbs Up
    Received: 1,276
    Given: 1,173

    1 Not allowed!

    Default

    A depigmented Eurasian with high siberian-like blood who talks about "wog" for the Sardinians and the Basques, it's cute.

    And the people here who say absolutely nothing about this completely stupid talk!

    Ambient stupidity.

Page 4 of 6 FirstFirst 123456 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 10
    Last Post: 01-04-2020, 06:29 PM
  2. Replies: 0
    Last Post: 05-22-2018, 03:47 PM
  3. IQ Correlation With Skin-Tone Diagram
    By Anglojew in forum Race and Society
    Replies: 32
    Last Post: 08-26-2017, 07:34 PM
  4. Visualizing the major causes of death in the 20th Century.
    By microrobert in forum Health and Lifestyle
    Replies: 0
    Last Post: 03-13-2013, 07:59 PM
  5. Visualizing the BP Oil Disaster
    By Grumpy Cat in forum Animals and Nature
    Replies: 15
    Last Post: 03-20-2011, 11:09 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •