Visualizing an ADMIXTURE run as a polygonal diagram

**Lucas** · 05-10-2021, 05:49 PM

You should make Github page with all those scripts. And then it could be officialy used by others in their papers.

**~~Komintasavalta~~** · 05-10-2021, 08:59 PM

If you do a non-SSA ADMIXTURE run with two components, you can estimate the amount of East Eurasian ancestry in different populations:

Code:

$ sort -rnk2 maalima2.i.2a|awk '{printf"%.1f %s\n",100*$2,$1}'
100.0 Zhuang
100.0 Tujia
100.0 Tibetan_Yunnan
100.0 She
100.0 Qiang
100.0 Nivh
100.0 Negidal
100.0 Naxi
100.0 Nanai
100.0 Mulam
100.0 Miao
100.0 Maonan
100.0 Li
100.0 Korean
100.0 Gelao
100.0 Dong
100.0 Ami
100.0 Atayal
99.9 Dai
99.9 Han
99.9 Japanese
99.8 Vietnamese
99.8 Yi
99.8 China_Lahu
99.8 Ulchi
99.7 Kankanaey
99.7 Murut
99.4 Kinh
99.2 Hezhen
98.7 Ilocano
98.4 Oroqen
98.3 Sherpa
98.2 Xibo
98.1 Dusun
97.7 Daur
97.6 Tibetan
97.4 Yugur
97.0 Mongola
96.5 Rai
95.5 Yukagir_Tundra
94.9 Nganasan
94.8 Bonan
94.6 Koryak
94.5 Visayan
94.5 Tu
94.0 Evenk_Transbaikal
93.8 Itelmen
93.6 Gurung
93.2 Chukchi
93.2 Tagalog
92.0 Chukchi1
91.4 Eskimo_ChaplinSireniki
90.9 Eskimo_Naukan
90.3 Salar
89.7 Cambodian
89.7 Khamnegan
89.6 Thai
89.4 Dungan
88.6 Dongxiang
88.6 Magar
88.6 Malay
88.5 Yakut
87.3 Todzin
87.1 Buryat
86.9 Tamang
86.8 Burmese
86.2 Mongol
83.7 Dolgan
83.5 Tofalar
83.4 Evenk_FarEast
83.2 Tuvinian
83.2 Karitiana
82.0 Kalmyk
81.2 Piapoco
81.2 Mixe
79.7 Surui
78.5 Pima
78.5 Kusunda
77.2 Zapotec
76.2 Mixtec
76.0 Enets
75.7 Kazakh_China
74.7 Khakass_Kachin
74.2 Altaian
72.7 Bolivian
71.9 Mayan
71.6 Kyrgyz_China
70.8 Quechua
70.8 Nasioi
69.2 Kyrgyz_Kyrgyzstan
68.7 Tharu
68.7 Ket
68.7 Even
68.0 Kyrgyz_Tajikistan
67.9 Papuan
67.7 Khakass
66.1 Newar
65.3 Selkup
63.7 Kazakh
63.3 Shor_Khakassia
62.6 Shor_Mountain
62.6 Tubalar
62.6 Australian
58.1 Altaian_Chelkan
56.5 Karakalpak
55.1 Hazara
54.4 Uyghur
54.0 Nogai_Astrakhan
52.7 Mansi
51.7 Tatar_Siberian_Zabolotniye
48.7 Nogai_Stavropol
47.8 Tatar_Siberian
46.9 Yukagir_Forest
46.6 Tlingit
42.2 Bahun
42.0 Bengali
39.1 Uzbek
36.9 Aleut
35.5 Bashkir
35.4 Turkmen
33.2 Punjabi
32.0 GujaratiD
30.1 GujaratiC
29.1 Burusho
28.3 Udmurt
27.1 GujaratiB
26.3 Nogai_Karachay_Cherkessia
24.9 Besermyan
23.5 GujaratiA
23.4 Jew_Cochin
23.4 Chuvash
22.1 Sindhi_Pakistan
21.3 Tatar_Kazan
19.7 Tajik
19.6 Pathan
18.5 Kalash
15.6 Tatar_Mishar
15.5 Russian_Archangelsk_Leshukonsky
14.3 Balochi
13.9 Turkish_Balikesir
13.6 Brahui
13.1 Russian_Archangelsk_Pinezhsky
11.0 Makrani
10.6 Abazin
10.4 Kabardinian
10.4 Veps
9.6 Russian_Archangelsk_Krasnoborsky
9.5 Karachai
9.3 Balkar
9.1 Karelian
8.7 Circassian
8.4 Azeri
8.3 Mordovian
8.0 Finnish
8.0 Ossetian
7.1 Kumyk
7.1 Ezid
6.4 Turkish
6.1 Adygei
6.0 Ingushian
5.9 Iranian
5.8 Russian
5.1 Lak
4.9 Avar
4.9 Tabasaran
4.8 Chechen
4.6 Lezgin
4.6 Darginian
4.5 Kaitag
4.3 Kubachinian
3.3 Kurd
2.9 Estonian
2.6 Abkhasian
2.5 Belarusian
2.1 Gagauz
1.6 Ukrainian
1.6 Ukrainian_North
1.6 Lithuanian
1.3 Hungarian
1.1 Lebanese
1.1 Georgian
1.0 Jew_Iranian
1.0 Moldavian
0.9 Czech
0.8 Jew_Georgian
0.7 Norwegian
0.7 Syrian
0.7 Jew_Ashkenazi
0.7 Yemeni_Desert
0.6 Jordanian
0.6 Assyrian
0.6 Lebanese_Muslim
0.6 Bulgarian
0.5 Armenian
0.5 Armenian_Hemsheni
0.4 Croatian
0.4 Saudi
0.3 Yemeni_Northwest
0.3 BedouinA
0.3 Yemeni_Highlands
0.3 French
0.3 Egyptian
0.3 Romanian
0.3 English
0.2 Icelandic
0.2 Lebanese_Christian
0.2 Maltese
0.2 Scottish
0.2 Palestinian
0.2 Greek
0.2 Orcadian
0.1 Italian_North
0.1 Italian_South
0.1 Druze
0.1 Jew_Turkish
0.1 Albanian
0.1 Spanish
0.0 Jew_Iraqi
0.0 Jew_Moroccan
0.0 Basque
0.0 Jew_Yemenite
0.0 Spanish_North
0.0 Sicilian
0.0 Sardinian
0.0 Jew_Tunisian
0.0 Jew_Libyan
0.0 Cypriot
0.0 Canary_Islander
0.0 BedouinB

I first did a global K=3 run of modern samples, where I selected samples where the years BP field in the anno file was 0:

Code:

curl -LsO reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.tar;tar -xf v44.3_HO_public.tar
f=v44.3_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
igno()(grep -Ev '\.REF|rel\.|fail\.|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutang|Primate_Chimp|hg19ref')
x=maalima;awk -F\\t 'NR>1{print$2,$8}' v44.3_HO_public.anno|igno|grep -Ev '\.(SG|SDG|DG|WGA)'|grep -v _o|cut -d' ' -f1|awk -F\\t 'NR==FNR{a[$0];next}$2 in a&&$6==0&&(!a[$3]++){print$2,$8}' - v44.3_HO_public.anno>$x.pick
plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
plink --allow-no-sex --bfile $x --genome --out $x
awk 'FNR>1&&$10>=.3{print$2<$4?$2:$4}' $x.genome|awk 'NR==FNR{a[$0];next}!($1 in a)' - $x.pick>$x.i.pick
plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.i.pick v44.3_HO_public.fam) --make-bed --out $x.i
plink --allow-no-sex --bfile $x.i --indep-pairwise 50 10 .01 --out $x.i
plink --bfile $x.i --extract $x.prune.in --make-bed --out $x.i.p
tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1,i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i,j]/n[i]);print o}}' "FS=${1-$'\t'}")
k=3;admixture -j4 -C .1 $x.i.p.bed $k;paste -d' ' <(awk 'NR==FNR{a[$1]=$2;next}{print$2,a[$2]}' $x.i.pick $x.i.p.fam) $x.i.p.$k.Q>$x.$k;cut -d' ' -f2- $x.$k|tav \ >$x.$k.ave

Then I selected samples that had less than 20% of the SSA component (excluding Australians and Papuans) and I did a new K=2 run.

Next I did a K=3 run of the non-SSA samples. Even though I excluded some North African populations with the highest SSA ancestry, the third component still became an SSA-like or basal-like component, where both Egyptians and Papuans have about 50% of the third component. Maltese have 22% of the third component and Greeks have 9%. Even Thais have 6% of the third component.

At K=4, Americans and Siberians split off from East-Southeast Asians. Kets have 70% of the American-Siberian component because they have so much ANE.

The clustering and nearest neighbors are based on just the runs at K=2, K=3, and K=4, because the K=5 run has already taken more than an hour. Maybe I should've done more aggressive LD pruning, because almost 100,000 SNPs remained even after `--indep-pairwise 50 10 .03`.

Sorry for all these huge images, but regular-size images look like crap on a retina display.

Originally Posted by Lucas

You should make Github page with all those scripts. And then it could be officialy used by others in their papers.

I already deleted my websites and my Github account years ago, and I decided that I was no longer going to make any contributions to the world.

Github is too gay and post web 2.0 anyway. Oldschool static websites are nicer.

Also I don't think they would like to use a script that says `set.seed(1488)`.

**~~Komintasavalta~~** · 05-11-2021, 09:15 AM

I had to leave it running overnight, but the runs at K=6, K=7, and K=8 now finished.

At K=7, I got a component that is similar to the Gedrosia component in Dodecad K7b. It is maximal in Kalash, Brahui, Sindhi_Pakistan, and Balochi. In the official K7b spreadsheet, the Gedrosia component is maximal Brahui, Balochi, Makrani, and Sindhi.

The European component is the highest in Lithuanians (96%) but it's the fifth highest in Spanish_North (94%) and the eighth highest in Basques (93%). In ADMIXTURE models where Southwestern Europeans have a high proportion of a European component, usually Uralic people have fairly high Mongoloid ancestry, and here also the proportion of the Nganasan component is 10% in Finns, 13% in Vepsians, and 30% in Udmurts.

Based on the links to the three nearest neighbors, there is a path from Finns to Mongols: first from Finnish to Veps, then to Tatar_Mishar, Tatar_Kazan, Chuvash, Udmurt, Aleut, Tlingit, Mansi, Altaian_Chelkan, Tubalar, Khakass, Altaian, Evenk_FarEast, Kalmyk, and then to Mongol. I didn't realize it until recently, but there is actually a huge genetic gap produced by the Gobi Desert, where Khalkha Mongols have a high genetic distance to Han and northern Chinese ethnicities. It is also visible in this image, where there is no line that connects Mongols to Hans, apart from lines that go through South Asians or Australians. However my method for calculating the nearest neighbors could still be improved, because now one of the three closest neighbors of Australians are Karakalpaks.

**Lemminkäinen** · 05-11-2021, 10:17 AM

Structure makes this triangle straight from the genome data.

**Zoro** · 05-11-2021, 10:28 AM

Originally Posted by Komintasavalta

If you do a non-SSA ADMIXTURE run with two components, you can estimate the amount of East Eurasian ancestry in different populations:

Code:

$ sort -rnk2 maalima2.i.2a|awk '{printf"%.1f %s\n",100*$2,$1}'
100.0 Zhuang
100.0 Tujia
100.0 Tibetan_Yunnan
100.0 She
100.0 Qiang
100.0 Nivh
100.0 Negidal
100.0 Naxi
100.0 Nanai
100.0 Mulam
100.0 Miao
100.0 Maonan
100.0 Li
100.0 Korean
100.0 Gelao
100.0 Dong
100.0 Ami
100.0 Atayal
99.9 Dai
99.9 Han
99.9 Japanese
99.8 Vietnamese
99.8 Yi
99.8 China_Lahu
99.8 Ulchi
99.7 Kankanaey
99.7 Murut
99.4 Kinh
99.2 Hezhen
98.7 Ilocano
98.4 Oroqen
98.3 Sherpa
98.2 Xibo
98.1 Dusun
97.7 Daur
97.6 Tibetan
97.4 Yugur
97.0 Mongola
96.5 Rai
95.5 Yukagir_Tundra
94.9 Nganasan
94.8 Bonan
94.6 Koryak
94.5 Visayan
94.5 Tu
94.0 Evenk_Transbaikal
93.8 Itelmen
93.6 Gurung
93.2 Chukchi
93.2 Tagalog
92.0 Chukchi1
91.4 Eskimo_ChaplinSireniki
90.9 Eskimo_Naukan
90.3 Salar
89.7 Cambodian
89.7 Khamnegan
89.6 Thai
89.4 Dungan
88.6 Dongxiang
88.6 Magar
88.6 Malay
88.5 Yakut
87.3 Todzin
87.1 Buryat
86.9 Tamang
86.8 Burmese
86.2 Mongol
83.7 Dolgan
83.5 Tofalar
83.4 Evenk_FarEast
83.2 Tuvinian
83.2 Karitiana
82.0 Kalmyk
81.2 Piapoco
81.2 Mixe
79.7 Surui
78.5 Pima
78.5 Kusunda
77.2 Zapotec
76.2 Mixtec
76.0 Enets
75.7 Kazakh_China
74.7 Khakass_Kachin
74.2 Altaian
72.7 Bolivian
71.9 Mayan
71.6 Kyrgyz_China
70.8 Quechua
70.8 Nasioi
69.2 Kyrgyz_Kyrgyzstan
68.7 Tharu
68.7 Ket
68.7 Even
68.0 Kyrgyz_Tajikistan
67.9 Papuan
67.7 Khakass
66.1 Newar
65.3 Selkup
63.7 Kazakh
63.3 Shor_Khakassia
62.6 Shor_Mountain
62.6 Tubalar
62.6 Australian
58.1 Altaian_Chelkan
56.5 Karakalpak
55.1 Hazara
54.4 Uyghur
54.0 Nogai_Astrakhan
52.7 Mansi
51.7 Tatar_Siberian_Zabolotniye
48.7 Nogai_Stavropol
47.8 Tatar_Siberian
46.9 Yukagir_Forest
46.6 Tlingit
42.2 Bahun
42.0 Bengali
39.1 Uzbek
36.9 Aleut
35.5 Bashkir
35.4 Turkmen
33.2 Punjabi
32.0 GujaratiD
30.1 GujaratiC
29.1 Burusho
28.3 Udmurt
27.1 GujaratiB
26.3 Nogai_Karachay_Cherkessia
24.9 Besermyan
23.5 GujaratiA
23.4 Jew_Cochin
23.4 Chuvash
22.1 Sindhi_Pakistan
21.3 Tatar_Kazan
19.7 Tajik
19.6 Pathan
18.5 Kalash
15.6 Tatar_Mishar
15.5 Russian_Archangelsk_Leshukonsky
14.3 Balochi
13.9 Turkish_Balikesir
13.6 Brahui
13.1 Russian_Archangelsk_Pinezhsky
11.0 Makrani
10.6 Abazin
10.4 Kabardinian
10.4 Veps
9.6 Russian_Archangelsk_Krasnoborsky
9.5 Karachai
9.3 Balkar
9.1 Karelian
8.7 Circassian
8.4 Azeri
8.3 Mordovian
8.0 Finnish
8.0 Ossetian
7.1 Kumyk
7.1 Ezid
6.4 Turkish
6.1 Adygei
6.0 Ingushian
5.9 Iranian
5.8 Russian
5.1 Lak
4.9 Avar
4.9 Tabasaran
4.8 Chechen
4.6 Lezgin
4.6 Darginian
4.5 Kaitag
4.3 Kubachinian
3.3 Kurd
2.9 Estonian
2.6 Abkhasian
2.5 Belarusian
2.1 Gagauz
1.6 Ukrainian
1.6 Ukrainian_North
1.6 Lithuanian
1.3 Hungarian
1.1 Lebanese
1.1 Georgian
1.0 Jew_Iranian
1.0 Moldavian
0.9 Czech
0.8 Jew_Georgian
0.7 Norwegian
0.7 Syrian
0.7 Jew_Ashkenazi
0.7 Yemeni_Desert
0.6 Jordanian
0.6 Assyrian
0.6 Lebanese_Muslim
0.6 Bulgarian
0.5 Armenian
0.5 Armenian_Hemsheni
0.4 Croatian
0.4 Saudi
0.3 Yemeni_Northwest
0.3 BedouinA
0.3 Yemeni_Highlands
0.3 French
0.3 Egyptian
0.3 Romanian
0.3 English
0.2 Icelandic
0.2 Lebanese_Christian
0.2 Maltese
0.2 Scottish
0.2 Palestinian
0.2 Greek
0.2 Orcadian
0.1 Italian_North
0.1 Italian_South
0.1 Druze
0.1 Jew_Turkish
0.1 Albanian
0.1 Spanish
0.0 Jew_Iraqi
0.0 Jew_Moroccan
0.0 Basque
0.0 Jew_Yemenite
0.0 Spanish_North
0.0 Sicilian
0.0 Sardinian
0.0 Jew_Tunisian
0.0 Jew_Libyan
0.0 Cypriot
0.0 Canary_Islander
0.0 BedouinB

I first did a global K=3 run of modern samples, where I selected samples where the years BP field in the anno file was 0:

Code:

curl -LsO reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.tar;tar -xf v44.3_HO_public.tar
f=v44.3_HO_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
igno()(grep -Ev '\.REF|rel\.|fail\.|Ignore_|_dup|_contam|_lc|_father|_mother|_son|_daughter|_brother|_sister|_sibling|_twin|Neanderthal|Denisova|Vindija_light|Gorilla|Macaque|Marmoset|Orangutang|Primate_Chimp|hg19ref')
x=maalima;awk -F\\t 'NR>1{print$2,$8}' v44.3_HO_public.anno|igno|grep -Ev '\.(SG|SDG|DG|WGA)'|grep -v _o|cut -d' ' -f1|awk -F\\t 'NR==FNR{a[$0];next}$2 in a&&$6==0&&(!a[$3]++){print$2,$8}' - v44.3_HO_public.anno>$x.pick
plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
plink --allow-no-sex --bfile $x --genome --out $x
awk 'FNR>1&&$10>=.3{print$2<$4?$2:$4}' $x.genome|awk 'NR==FNR{a[$0];next}!($1 in a)' - $x.pick>$x.i.pick
plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.i.pick v44.3_HO_public.fam) --make-bed --out $x.i
plink --allow-no-sex --bfile $x.i --indep-pairwise 50 10 .01 --out $x.i
plink --bfile $x.i --extract $x.prune.in --make-bed --out $x.i.p
tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1,i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i,j]/n[i]);print o}}' "FS=${1-$'\t'}")
k=3;admixture -j4 -C .1 $x.i.p.bed $k;paste -d' ' <(awk 'NR==FNR{a[$1]=$2;next}{print$2,a[$2]}' $x.i.pick $x.i.p.fam) $x.i.p.$k.Q>$x.$k;cut -d' ' -f2- $x.$k|tav \ >$x.$k.ave

Then I selected samples that had less than 20% of the SSA component (excluding Australians and Papuans) and I did a new K=2 run.

Next I did a K=3 run of the non-SSA samples. Even though I excluded some North African populations with the highest SSA ancestry, the third component still became an SSA-like or basal-like component, where both Egyptians and Papuans have about 50% of the third component. Maltese have 22% of the third component and Greeks have 9%. Even Thais have 6% of the third component.

At K=4, Americans and Siberians split off from East-Southeast Asians. Kets have 70% of the American-Siberian component because they have so much ANE.

The clustering and nearest neighbors are based on just the runs at K=2, K=3, and K=4, because the K=5 run has already taken more than an hour. Maybe I should've done more aggressive LD pruning, because almost 100,000 SNPs remained even after `--indep-pairwise 50 10 .03`.

Sorry for all these huge images, but regular-size images look like crap on a retina display.

I already deleted my websites and my Github account years ago, and I decided that I was no longer going to make any contributions to the world.

Github is too gay and post web 2.0 anyway. Oldschool static websites are nicer.

Also I don't think they would like to use a script that says `set.seed(1488)`.

Good job with graphing! I like how you continuously try to find new ways to visualize your results. Keep on doing what you do.

Here’s a few notes on your K3 unsupervised admixture run:

- Clustering should not necessarily be interpreted as shared genetic drift in the last 30,000 years, or geneflow. The classic example is Neanderthal clustering with SSA in Admixture. Therefore SNP ascertainment substantially skews results.

- You’ll notice problems with SNP ascertainment in Reich dataset with Turkmen, Tatar, Altaians and a couple of others not clustering as expected (maybe that’s why you didn’t include them?)

- Papuans having 50% of the oramge component along with Egyptians and Bedouin should not be interpreted as orange being Basal Eurasian because Papuans should not be scoring that much Basal Eurasian. Orange component maybe nothing more than clustering due to very ancient million year old alleles or some other non meaningful SNP artifacts

- Ezidi Kurds having higher E. Eurasian than Kurmanji or Sorani Kurds would not make sense in my experience since they don’t have as much Central Asian input as other Kurds

If you want to make a K3 based on East and West Eurasian my suggestion is supervised using good LBK samples as West Eurasian proxies

**~~Komintasavalta~~** · 05-11-2021, 10:18 PM

I now figured out how to use the `circlize` package to draw a circular stacked bar chart: https://jokergoo.github.io/circlize_book/book/. Next I'll try to learn how to add a thin bar for each individual sample within a population.

Code:

library(circlize)
library(vegan) # for reorder.hclust (may be masked by the package seriation)
library(dendextend) # for color_branches

f="uralaltaic.i"
kvals=c(3,7)
columnorder=list(c(3,2,1),c(1,5,4,3,6,7,2))

mats=sapply(kvals,function(x)read.table(paste0(f,".",x,"a"),r=1)[,columnorder[lapply(columnorder,length)==x][[1]]])

joined=do.call(cbind,sapply(Sys.glob(paste0(f,".[0-9]a")),function(x)read.table(x,r=1)))
dist=as.data.frame(as.matrix(dist(joined)))
hc=hclust(dist(joined))

hc=reorder(hc,dist[,"Nganasan"]-dist[,"Estonian"])
# hc=reorder(hc,mats[[1]][,3]-mats[[1]][,1])
# maxdist=which(dist==max(dist))[1];hc=reorder(hc,dist[,maxdist%%nrow(dist)]-dist[,maxdist%/%nrow(dist)+1])

labelcolor=hcl(c(260,120,60,0,220,160,310,90)+15,60,70)
barcolor=list(hcl(c(220,120,310)+15,60,70),hcl(c(220,60,120,0,270,90,310)+15,60,70))

labels=hc$labels[hc$order]
cut=cutree(hc,8)
dend=color_branches(as.dendrogram(hc),k=length(unique(cut)),col=labelcolor[unique(cut[labels])])

circos.clear()
png("a.png",w=2500,h=2500,res=300)
circos.par(cell.padding=c(0,0,0,0))
circos.initialize(0,xlim=c(0,nrow(mats[[1]])))

circos.track(ylim=c(0,1),bg.border=NA,track.height=.2,track.margin=c(.005,0),panel.fun=function(x,y)
  for(i in 1:nrow(mats[[1]]))circos.text(i-.5,0,labels[i],adj=c(0,.5),facing="clockwise",niceFacing=T,cex=.65,col=labelcolor[cut[labels[i]]])
)

for(j in length(mats):1)circos.track(ylim=c(0,1),track.height=.25,track.margin=c(0,.01),bg.lty=0,panel.fun=function(x,y){
  mat=as.matrix(mats[[j]][hc$order,])
  pos=1:nrow(mat)-.5
  barwidth=1
  for(i in 1:ncol(mat)){
    seq1=rowSums(mat[,seq(i-1),drop=F])
    seq2=rowSums(mat[,seq(i),drop=F])
    circos.rect(pos-barwidth/2,if(i==1){0}else{seq1},pos+barwidth/2,seq2,col=barcolor[[j]][i],border="gray20",lwd=.1)
  }
  for(i in 1:ncol(mat)){
    seq1=rowSums(mat[,seq(i-1),drop=F])
    seq2=rowSums(mat[,seq(i),drop=F])
    lab=round(100*mat[,i])
    lab[lab<=1]=""
    circos.text(pos,if(i==1){seq1/2}else{seq1+(seq2-seq1)/2},labels=lab,col="gray10",cex=.5,facing="downward")
  }
})

circos.track(ylim=c(0,attr(dend,"height")),track.height=.25,track.margin=c(0,.0015),bg.border=NA,panel.fun=function(x,y)circos.dendrogram(dend))

circos.clear()
dev.off()

Originally Posted by Lemminkäinen

Structure makes this triangle straight from the genome data.

I guess you mean this (from the Structure manual, https://web.stanford.edu/group/pritc...ucture_doc.pdf):

I thought that this was something that was invented by me...

**~~Komintasavalta~~** · 05-17-2021, 04:41 PM

Originally Posted by Zoro

If you want to make a K3 based on East and West Eurasian my suggestion is supervised using good LBK samples as West Eurasian proxies

I haven't been very successful in making supervised runs where I have only used ancient samples as references. But if I do an unsupervised run with the right mixture of modern and ancient samples, I can get components for WHG and LBK at a relatively low K value.

The image below shows two ADMIXTURE runs at K=3 and K=6, where I included modern samples with the suffix `.DG`, and I included ancient samples with over 500,000 SNPs and with mean age BP over 6,000. I used `--indep-pairwise 50 10 .05` which kept 72,163 SNPs.

Now I don't get that much SSA in Eurasians even at K=3, but maybe it's partially because the SSA component includes many Capoid-Bambutids, so even West Africans only get 90-95% of the SSA component at K=3. In Gedrosia K3, even Somalis can get 3% East Eurasian ancestry, but maybe it's for similar reasons that even West Africans get East Eurasian ancestry in these runs.

However there's also something weird about how Villabruna gets 10% SSA at K=3.

At K=3, Norwegians get 8% of the East Eurasian component, but at K=6, Norwegians get 9% of the American component and 0% of the East Eurasian component. At K=6, Finns get 5% East Asian in addition to 10% American. At K=6, Karelia_HG gets 25% American, 42% WHG, and 32% LBK.

Here's a SmartPCA run of the same samples without SSAs, Saharans, or Australo-Melanesians. This time the clustering and lines to nearest neighbors are not based on an FST matrix, but they're just based on the first 8 dimensions of the PCA multiplied by the square roots of the eigenvalues. When I include ancient populations in an FST run, there's usually a huge distance from some ancient populations to other populations. I don't know if it's because of missing data or something.

BTW where can we see the proxies that were used in Gedrosia K3? Was the West Eurasian component based on LBK or something?

**Petalpusher** · 05-17-2021, 05:03 PM

Originally Posted by Komintasavalta

BTW where can we see the proxies that were used in Gedrosia K3? Was the West Eurasian component based on LBK or something?

Yes. It's a quote from gedwiki (link seems dead right now)

Eurasia K3 - E Eurasian, W Eurasian, and Sub-Saharan African Calculator
This calculator calculates an individual's E Eurasian, W Eurasian, and Sub-Saharan African admixture.

The components are defined as follows:

1- E Eurasian - This component peaks in E & SE populations such as Ami, Nivkh, Dai, Han, and Ulchi, at about 100%, followed by Siberian & other Asian populations such as Nganasans, Tibetans, Subba, and Mongola.

2- W Eurasian - This component peaks in Neolithic European farmers such as Stuttgart, and LBK culture, as well as in most modern European populations at over 95%.

3- SSA (Sub-Saharan African) - This component peaks in Sub-Saharan African populations such as Yoruban, Esan, and Luhiya at over 97%.

**~~Komintasavalta~~** · 05-29-2021, 08:44 AM

New VUR vs Moor K2 calculator:

$ x=vurmoor
$ printf %s\\n Albanian Basque Basque.SDG Belarusian Besermyan Bulgarian Chuvash Cretan.DG Croatian Czech English Estonian Finnish Finnish.DG French French.SDG Gagauz Greek Hungarian Icelandic Italian_North Italian_South Karelian Lithuanian Maltese Mari.SDG Moldavian Mordovian Norwegian Norwegian.DG Orcadian Orcadian.SDG Polish.DG Romanian Russian Russian.SDG Russian_Archangelsk_Krasnoborsky Russian_Archangelsk_Leshukonsky Russian_Archangelsk_Pinezhsky Saami.DG Sardinian Scottish Sicilian Spanish Spanish_North Tatar_Kazan Tatar_Mishar Udmurt Ukrainian Ukrainian_North Veps>$x.pop
$ awk -F\\t 'NR==FNR{a[$0];next}$8 in a&&!a[$3]++{print$3,$8}' $x.pop v44.3_HO_public.anno|awk '++a[$2]<=16'>$x.pick
$ plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
[...]
$ plink --allow-no-sex --bfile $x --indep-pairwise 50 10 .05 --out $x;plink --allow-no-sex --bfile $x --extract $x.prune.in --make-bed --out $x.p
[...]
$ admixture -j4 -C .1 $x.p.bed 2
[...]
$ tav()(awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1][i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i][j]/n[i]);print o}}' "FS=${1-$'\t'}")
$ awk 'NR==FNR{a[$1]=$2;next}{print a[$2]}' $x.{pick,fam}|paste -d' ' - $x.p.2.Q|sed -E 's/\.S?DG//'|tav ' '|sort -rnk2|awk '{for(i=2;i<=NF;i++)printf"%.0f ",100*$i;print$1}'
100 0 Udmurt
100 0 Chuvash
100 0 Besermyan
100 0 Russian_Archangelsk_Pinezhsky
100 0 Russian_Archangelsk_Leshukonsky
99 1 Veps
96 4 Tatar_Kazan
93 7 Karelian
86 14 Tatar_Mishar
83 17 Russian_Archangelsk_Krasnoborsky
81 19 Finnish
74 26 Mordovian
68 32 Estonian
66 34 Russian
64 36 Lithuanian
61 39 Ukrainian_North
57 43 Belarusian
54 46 Ukrainian
37 63 Czech
32 68 Hungarian
32 68 Icelandic
30 70 Gagauz
28 72 Norwegian
26 74 Scottish
26 74 Moldavian
23 77 Orcadian
22 78 Croatian
22 78 English
17 83 Bulgarian
16 84 Romanian
13 87 French
4 96 Albanian
4 96 Greek
0 100 Italian_North
0 100 Sicilian
0 100 Sardinian
0 100 Maltese
0 100 Italian_South
0 100 Basque
0 100 Spanish_North
0 100 Spanish

It's cool how Finns are 81% VUR but Norwegians are 72% Moor.

**Flashball** · 05-29-2021, 10:00 AM

A depigmented Eurasian with high siberian-like blood who talks about "wog" for the Sardinians and the Basques, it's cute.

And the people here who say absolutely nothing about this completely stupid talk!

Ambient stupidity.