You need to add population numbers to the sixth field of the fam file:
https://www.biostars.org/p/266511/. The commands below use integers starting from 10 as group identifiers, because the numbers 1, 2, and 9 have a special meaning (1 assigns the line as a case, 2 assigns it as a control, and 9 ignores it).
`phylipoutname: fstfilename` saves an FST matrix to a file, but in the file the FST values only have three digits after the decimal point. There's also the undocumented parameter `fsthiprecision: YES` which causes the FST values that are printed to STDOUT to be multiplied by million instead of thousand, but it doesn't affect the contents of the `phylipoutname` file.
If an FST run includes more than 100 populations, SmartPCA exits with an error unless you include a parameter like `maxpops: 1000`.
So I ended up with code like this:
Code:
x=uralic
printf %s\\n Besermyan Enets Estonian Finnish Hungarian Karelian Mansi Mordovian Nganasan Saami.DG Selkup Udmurt Veps>$x.pop
sed 1d v44.3_HO_public.anno|sort -t$'\t' -rnk15|awk -F\\t '!a[$3]++{print$2,$8}'|awk 'NR==FNR{a[$0];next}$2 in a' $x.pop ->$x.pick
plink --allow-no-sex --bfile v44.3_HO_public --keep <(awk 'NR==FNR{a[$1];next}$2 in a' $x.pick v44.3_HO_public.fam) --make-bed --out $x
awk '!a[$2]++{i++}{print$1,i}' $x.pick|awk 'NR==FNR{a[$1]=$2;next}{$6=a[$2]+9}1' - $x.fam>$x.famtemp;mv $x.fam{temp,}
smartpca -p <(printf %s\\n genotypename:\ $x.bed snpname:\ $x.bim indivname:\ $x.fam fstonly:\ YES fsthiprecision:\ YES)|tee $x.smartpca
p=$(cut -d' ' -f2 $x.pick|awk '!a[$0]++');sed -n '/fst \*1000000/,/^$/p' $x.smartpca|sed 1,2d|sed \$d|tr -s ' ' ,|cut -d, -f3-|paste -d, <(echo "$p") -|cat <(printf %s\\n '' "$p"|paste -sd,) ->$x.fst
Maybe you're supposed to do LD pruning before calculating FST, because
Kerminen et al. 2021 said this: "We calculated pairwise-FST between the reference groups (Fig 2) and the ancestor candidate groups (S9 Fig) using SmartPCA of EIGENSOFT package[7] (fstonly: YES, fsthiprecision: YES) and 56,661 LD-independent variants."
Bookmarks