Page 1 of 13 1234511 ... LastLast
Results 1 to 10 of 122

Thread: Plink related questions

  1. #1
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    4 Not allowed!

    Default Plink related questions

    This thread is for asking Plink related questions and posting IBS or IBD results from Plink programs such as --genome or any other Plink program

    @ Komintasavalta


    Regarding your question in the other thread of how to convert from .geno .snp .ind format used in Admixtools to Plink .bed .bim .fam.
    Here's how I do it:

    Create a text file like this. Use your own file name instead of sample and save it as par.PED.PACKEDPED

    genotypename: sample.geno
    snpname: sample.snp
    indivname: sample.ind
    outputformat: PACKEDPED
    genotypeoutname: sample.bed
    snpoutname: sample.bim
    indivoutname: sample.fam
    familynames: YES

    At a linux terminal execute command:

    ......../convertf -p par.PED.PACKEDPED

    put the path to the ADMIXTOOLS convertf file instead of ..........

    You'll receive 3 Plink files : bed bim fam

    Check your fam file. You can edit the names of the populations and their IDs to something you like or like how they were named in the .ind file
    Muzh ba staso la tyaro tsakha ra wubaasu

    [IMG][/IMG]

  2. #2
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    1 Not allowed!

    Default

    Discussion related to Lezgins, Chechens, and Daghestanis in Iraq, Lezgin-Kurd IBS closeness and Lezgin IBS located at :

    https://www.theapricity.com/forum/sh...s-Vol-4/page52

  3. #3
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    1 Not allowed!

    Default

    Discussion related to G25 distance results wrongly showing :

    1- Eurasians such as Mongols closer to Khomani-San and Ju-Hoan than to Mbuti
    2- Eurasians such as Kurds closer to SSA than other Eurasians such as Papuans, Karitiana, and Surui
    3- Kurds closer to Jordanians than to Uyghur , Baloch, Brahui etc

    Conclusion: The above leads to overestimation of SW Asian and African in W Asians such as Kurds and underestimation of E Asian and Siberian

    AND Plink IBS results correctly showing above in contrast to G25

    located at https://www.theapricity.com/forum/sh...ome-here/page2

  4. #4
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    1 Not allowed!

    Default

    Yeah I figured it out already. I did something like this to make a global PCA of modern individuals in the Reich dataset:

    Code:
    wget reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_1240K_public.tar
    tar -xf v44.3_1240K_public.tar
    f=v44.3_1240K_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    sed 1d v44.3_1240K_public.anno|grep -v 1KGPhase|awk -F\\t '$9=="Modern"{print$2,$13}'|grep -Ev '_dup|Ignore_|\.REF|_o'|sed -E 's/\.(SDG|DG|SG)$//'>picks0
    awk 'NR==FNR{a[$1];next}$2 in a' picks0 v44.3_1240K_public.fam>picks
    plink --bfile v44.3_1240K_public --keep picks --allow-no-sex --make-bed --out picks
    plink --bfile picks --pca --geno .001 --allow-no-sex --out picks
    paste -d' ' <(cut -d' ' -f2 picks0) <(cut -d' ' -f2- picks.eigenvec)>picks.eigenvec.2
    When I didn't add `--geno .001`, it fked up the clustering of some populations at first, so that for example one South Asian population clustered together with Africans.

    I couldn't get convertf to compile, but I downloaded a Mac binary from here: https://github.com/chrchang/eigensoft. The Mac binaries for plink from Harvard's website didn't work, but there were working binaries by another maintainer here: https://www.cog-genomics.org/plink/1.9/. You can download the v44.3_1240K_public.tar file manually from here: https://reichdata.hms.harvard.edu/pu...ses/index.html.

    I then made this in R:



    Code:
    libarary(tidyverse)
    library(colorspace)
    
    f="picks"
    t=read.table(paste0(f,".eigenvec.2"),sep=" ")
    eig=as.double(readLines(paste0(f,".eigenval")))
    
    # t=cbind(t[,c(1,2)],t(t(t[,-c(1,2)])*sqrt(eig))) # I think this corresponds to scaling in G25
    
    pct=paste0("PC",seq(length(eig))," (",sprintf("%.1f",100*eig/sum(eig)),"%)")
    
    ave=aggregate(t[,-c(1,2)],list(t[,1]),mean)
    names(ave)=c("pop",paste0("PC",seq(ncol(ave)-1)))
    
    k=cutree(hclust(dist(ave[,-1]),method="ward.D2"),k=12)
    write.csv(k,"/tmp/k",quote=F)
    ave$k=k
    
    ggplot(ave,aes(x=PC1,y=PC2,label=pop))+
    geom_point(aes(color=as.factor(k)),size=.5)+
    geom_polygon(data=ave%>%group_by(k)%>%slice(chull(PC1,PC2)),alpha=.2,aes(color=as.factor(k),fill=as.factor(k)),size=.3)+
    geom_text(aes(label=pop,color=as.factor(k)),size=2,vjust=-.7)+
    theme(
      aspect.ratio=3/4,
      axis.text=element_text(color="black",size=7),
      axis.ticks.length=unit(0,"pt"),
      axis.ticks.x=element_blank(),
      axis.ticks.y=element_blank(),
      axis.title=element_text(color="black",size=10),
      legend.position="none",
      panel.background=element_rect(fill="white"),
      panel.grid.major=element_line(color="gray75",size=.2)
    )+
    scale_x_continuous(breaks=seq(-2,2,.1),expand=expansion(mult=.07))+
    scale_y_continuous(breaks=seq(-2,2,.1),expand=expansion(mult=.06))+
    labs(x=pct[1],y=pct[2])+
    scale_color_discrete_qualitative(palette="Set 2",c=80,l=40)
    
    ggsave("output.png")
    system("/usr/local/bin/mogrify -trim -bordercolor white -border 20x20 output.png")
    However there's something wrong with the distances in my PCA. For example Finns have about 5 times bigger distance to Khomani_San than to Yoruba:

    Code:
    $ tav(){ awk '{n[$1]++;for(i=2;i<=NF;i++){a[$1][i]+=$i}}END{for(i in a){o=i;for(j=2;j<=NF;j++)o=o FS sprintf("%f",a[i][j]/n[i]);print o}}' "FS=${1-$'\t'}";}
    $ dist(){ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}$1{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5,$1}' "$2" "$1"|sort -n|awk '{printf"%.3f %s\n",$1,$2}'|sed s,^0,,;}
    $ paste -d' ' <(cut -d' ' -f2 maailma0) <(cut -d' ' -f3- maailma.eigenvec)|tav ' '|tr ' ' ,>ave;dist ave <(grep Finnish ave)|tail -n16
    .130 Karitiana
    .130 Mende
    .135 Gambian
    .136 Esan
    .155 Yoruba
    .174 Papuan
    .189 Mandenka
    .206 BantuKenya
    .216 Biaka
    .242 BantuSA
    .268 Mbuti
    .336 Ju_hoan_North
    .492 BantuHerero
    .492 BantuSA_Herero
    .497 BantuTswana
    .705 Khomani_San
    Am I supposed to apply some further quality control or filtering? I tried to include only samples with few missing SNPs: `awk -F\\t '$21>9e5' v44.3_1240K_public.anno`. I also tried increasing the value of the `--geno` option and I tried adding an option like `--max-maf .3`. None of it helped however.

    I also tried multiplying the columns of the table with the square roots of the eigenvalues but it didn't help:

    Code:
    f="picks"
    t=read.table(paste0(f,".eigenvec.2"),sep=" ")
    eig=as.double(readLines(paste0(f,".eigenval")))
    
    t2=(cbind(t[,c(1,2)],t(t(t[,-c(1,2)])*sqrt(eig))))
    
    ave=aggregate(t[,-c(1,2)],list(t[,1]),mean)
    ave2=aggregate(t2[,-c(1,2)],list(t2[,1]),mean)
    
    ind=cbind(paste0(t[,1],":",t[,2]),t[,-c(1,2)])
    ind2=cbind(paste0(t2[,1],":",t2[,2]),t2[,-c(1,2)])
    
    write.table(ind,paste0(f,".ind"),quote=F,sep=",",col.names=F,row.names=F)
    write.table(ind2,paste0(f,".indscaled"),quote=F,sep=",",col.names=F,row.names=F)
    write.table(ave,paste0(f,".ave"),quote=F,sep=",",col.names=F,row.names=F)
    write.table(ave2,paste0(f,".avescaled"),quote=F,sep=",",col.names=F,row.names=F)
    Last edited by Komintasavalta; 03-12-2021 at 08:35 AM.

  5. #5
    Veteran Member Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Oct 2016
    Last Online
    @
    Ethnicity
    me
    Country
    European Union
    Y-DNA
    R1a > YP1337 > R-BY160486*
    mtDNA
    H3*
    Gender
    Posts
    6,066
    Thumbs Up
    Received: 7,243
    Given: 2,623

    1 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    Yeah I figured it out already. I did something like this to make a global PCA of modern individuals in the Reich dataset:

    Code:
    wget reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_1240K_public.tar
    tar -xf v44.3_1240K_public.tar
    f=v44.3_1240K_public;convertf -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind outputformat:\ PACKEDPED genotypeoutname:\ $f.bed snpoutname:\ $f.bim indivoutname:\ $f.fam)
    sed 1d v44.3_1240K_public.anno|grep -v 1KGPhase|awk -F\\t '$9=="Modern"{print$2,$13}'|grep -Ev '_dup|Ignore_|\.REF|_o'|sed -E 's/\.(SDG|DG|SG)$//'|gv BIR>picks0
    awk 'NR==FNR{a[$1];next}$2 in a' picks0 v44.3_1240K_public.fam>picks
    plink --bfile v44.3_1240K_public --keep picks --allow-no-sex --make-bed --out picks
    plink --bfile picks --pca --geno .001 --allow-no-sex --out picks
    paste -d' ' <(cut -d' ' -f2 picks0) <(cut -d' ' -f2- picks.eigenvec)>picks.eigenvec.2
    Why not SmartPCA? Davidski used it for G25, not PlinkPCA https://eurogenes.blogspot.com/2017/...-bias-fix.html

  6. #6
    Veteran Member Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Oct 2016
    Last Online
    @
    Ethnicity
    me
    Country
    European Union
    Y-DNA
    R1a > YP1337 > R-BY160486*
    mtDNA
    H3*
    Gender
    Posts
    6,066
    Thumbs Up
    Received: 7,243
    Given: 2,623

    1 Not allowed!

    Default

    For plink dataset do also LD based pruning https://zzz.bwh.harvard.edu/plink/summary.shtml#prune

    plink --file data --indep-pairwise 50 5 0.5 (for last better lower value like 0.3)

    =======================================

    Also missing rate per person https://zzz.bwh.harvard.edu/plink/thresh.shtml#miss2

    plink --file mydata --mind 0.1

    ==========================================
    Also minor allele frequency exclude https://zzz.bwh.harvard.edu/plink/thresh.shtml#maf

    plink --file mydata --maf 0.05

    After that dataset will be smaller in size of course but should be better.

  7. #7
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    3 Not allowed!

    Default

    Quote Originally Posted by Lucas View Post
    Why not SmartPCA? Davidski used it for G25, not PlinkPCA https://eurogenes.blogspot.com/2017/...-bias-fix.html
    I tried SmartPCA with the whole Reich dataset at first, but the dataset was rejected because there were more than 100 populations:

    $ f=g/v44.3_1240K_public/v44.3_1240K_public;smartpca -p <(printf %s\\n genotypename:\ $f.geno snpname:\ $f.snp indivname:\ $f.ind evecoutname:\ evec evaloutname:\ eval)
    parameter file: /dev/fd/63
    ### THE INPUT PARAMETERS
    ##PARAMETER NAME: VALUE
    genotypename: g/v44.3_1240K_public/v44.3_1240K_public.geno
    snpname: g/v44.3_1240K_public/v44.3_1240K_public.snp
    indivname: g/v44.3_1240K_public/v44.3_1240K_public.ind
    evecoutname: evec
    evaloutname: eval
    ## smartpca version: 10210
    norm used

    read 1073741824 bytes
    read 2147483648 bytes
    read 2859357147 bytes
    packed geno read OK
    number of populations too large. Increase maxpops if you wish
    fatalx:
    (makeeglist) You really want to analyse more than 100 populations?

    I think the maxpops option needs to be changed from the source code where it's defined as `#define MAXPOPS 100`. Adding a maxpops option to the parfile didn't have an effect, and it's not documented as one of the options in the parfile (https://github.com/chrchang/eigensof.../POPGEN/README).

    Next I tried SmartPCA with a subset of samples from the Reich dataset:

    $ plink --bfile g/bed/v44.3_1240K_public --keep <(awk -F\\t '$9=="Modern"&&$21>9e5{print$2}' g/v44.3_1240K_public/v44.3_1240K_public.anno|grep -v REF|head -n200|awk 'NR==FNR{a[$0];next}$2 in a' - g/bed/v44.3_1240K_public.fam) --make-bed --out reichsubset
    $ f=reichsubset;smartpca -p <(printf %s\\n genotypename:\ $f.bed snpname:\ $f.bim indivname:\ $f.fam evecoutname:\ $f.evec evaloutname:\ $f.eval numoutlieriter:\ 0)

    Without the option `numoutlieriter: 0`, it removed 30 out of 200 of the samples as outliers (including all SSAs).

    However like with `plink --pca`, the distances of Khoisan and Bambutids seemed too high.

    Actually what I needed was `--maf .05`:

    $ plink --bfile g/bed/v44.3_1240K_public --keep <(awk -F\\t '$9=="Modern"&&$21>9e5{print$2}' g/v44.3_1240K_public/v44.3_1240K_public.anno|grep -v REF|head -n200|awk 'NR==FNR{a[$0];next}$2 in a' - g/bed/v44.3_1240K_public.fam) --allow-no-sex --maf .05 --make-bed --out withmaf
    $ plink --bfile g/bed/v44.3_1240K_public --keep <(awk -F\\t '$9=="Modern"&&$21>9e5{print$2}' g/v44.3_1240K_public/v44.3_1240K_public.anno|grep -v REF|head -n200|awk 'NR==FNR{a[$0];next}$2 in a' - g/bed/v44.3_1240K_public.fam) --allow-no-sex --make-bed --out nomaf
    $ f=withmaf;smartpca -p <(printf %s\\n genotypename:\ $f.bed snpname:\ $f.bim indivname:\ $f.fam evecoutname:\ $f.evec evaloutname:\ $f.eval numoutlieriter:\ 0)
    $ f=nomaf;smartpca -p <(printf %s\\n genotypename:\ $f.bed snpname:\ $f.bim indivname:\ $f.fam evecoutname:\ $f.evec evaloutname:\ $f.eval numoutlieriter:\ 0)
    $ sed 1d withmaf.evec|awk '{$1=$1}NF--' OFS=,|cut -d: -f2 >withmafdist
    $ sed 1d nomaf.evec|awk '{$1=$1}NF--' OFS=,|cut -d: -f2 >nomafdist
    $ dist(){ awk -F, 'NR==FNR{for(i=2;i<=NF;i++)a[i]=$i;next}$1{s=0;for(i=2;i<=NF;i++)s+=($i-a[i])^2;print s^.5,$1}' "$2" "$1"|sort -n|awk '{printf"%.3f %s\n",$1,$2}'|sed s,^0,,;}
    $ dist withmafdist <(grep Finnish withmafdist)|tail -n16
    .439 B_Karitiana-3.DG
    .443 S_Eskimo_Sireniki-1.DG
    .453 S_Eskimo_Sireniki-2.DG
    .455 S_BedouinB-2.DG
    .472 S_Eskimo_Chaplin-1.DG
    .473 S_Eskimo_Naukan-1.DG
    .473 A_Ju_hoan_North-5.DG
    .475 S_Eskimo_Naukan-2.DG
    .486 S_Khomani_San-1.DG
    .495 B_Ju_hoan_North-4.DG
    .503 S_Ju_hoan_North-1.DG
    .511 S_BedouinB-1.DG
    .516 S_Ju_hoan_North-2.DG
    .597 A_Mbuti-5.DG
    .601 B_Mbuti-4.DG
    .627 S_Mbuti-3.DG
    $ dist nomafdist <(grep Finnish nomafdist)|tail -n16
    .314 S_Papuan-2.DG
    .316 A_Karitiana-4.DG
    .318 S_Papuan-9.DG
    .323 A_Papuan-16.DG
    .326 B_Karitiana-3.DG
    .524 B_Mbuti-4.DG
    .530 B_Ju_hoan_North-4.DG
    .546 S_Ju_hoan_North-1.DG
    .553 S_Ju_hoan_North-2.DG
    .562 S_Mbuti-3.DG
    .592 S_Khomani_San-1.DG
    .629 A_Mbuti-5.DG
    .738 B_Yoruba-3.DG
    .880 S_Yoruba-2.DG
    1.003 A_Yoruba-4.DG
    1.008 A_Ju_hoan_North-5.DG

    With `--maf .05` I got a plot similar to G25, but with `--maf .01` the distance from Bambutids and Capoids to other humans was reduced only moderately:





    But what if the distance between Finns and Ju'Hoan is actually supposed to be much bigger than the distance between Finns and Karitiana? Could it be artificially reduced by G25 because it removes minor alleles that are specific to Capoids?
    Last edited by Komintasavalta; 03-12-2021 at 12:05 PM.

  8. #8
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    1 Not allowed!

    Default

    Quote Originally Posted by Komintasavalta View Post
    I tried SmartPCA with the whole Reich dataset at first, but the dataset was rejected because there were more than 100 populations:

    But what if the distance between Finns and Ju'Hoan is actually supposed to be much bigger than the distance between Finns and Karitiana? Could it be artificially reduced by G25 because it removes minor alleles that are specific to Capoids?

    I wouldn’t use —maf because that removes positions with allele frequency below for ex 0.01 if —maf 0.01. I would instead use —max-maf which dies opposite. For ex —max-maf 0.4 removes uninformative alleles common to your data > 40%

    Also i would use —geno 0.001 if one wants to have an overlapping set of SNPs in all samples, in other words one doesn’t want some samples to have more Snps than others

  9. #9
    Banned
    Join Date
    Sep 2020
    Last Online
    09-12-2023 @ 03:47 PM
    Location
    コミ共和国
    Meta-Ethnicity
    Finno-Permic
    Ethnicity
    Peasant
    Ancestry
    コミ
    Country
    Finland
    Taxonomy
    Karaboğa (euryprosopic, platyrrhine, dolichocephalic)
    Relationship Status
    Virgin
    Gender
    Posts
    2,170
    Thumbs Up
    Received: 4,862
    Given: 2,946

    0 Not allowed!

    Default

    Actually `--maf .05` is probably way too high. When I tried `plink --pca` with different `--maf` settings, `--maf .05` caused Finns to be less than twice as far from Khomani San as from Armenians. The effect of `--maf` became noticeable between .005 and .01, but it became huge between .01 and .05.

    $ for x in {0001,001,005,01,05};do plink --bfile g/bed/v44.3_1240K_public --keep <(awk -F\\t '$9=="Modern"&&$2~/\.DG$/{print$2,$13}' g/v44.3_1240K_public/v44.3_1240K_public.anno|grep -Ev 'REF\.|Ignore_|_o'|awk 'NR==FNR{a[$1];next}$2 in a' - g/bed/v44.3_1240K_public.fam) --allow-no-sex --maf .$x --geno .1 --pca --out $x;done
    $ for x in {0001,001,005,01,05};do printf %s\\n '' "--maf .$x --geno .1:";grep Finnish-1 $x.eigenvec|awk 'NR==1{for(i=3;i<=NF;i++)a[i]=$i;next}{s=0;for(i=3;i<=NF;i++)s+=(a[i]-$i)^2;print s^.5,$2}' - $x.eigenvec|sort -n|egrep '(Khomani_San|Karitiana|Mbuti|Eskimo_Sireniki|Arme nian|Hungarian)-1';done

    --maf .0001 --geno .1:
    0.0307083 S_Hungarian-1.DG
    0.100753 S_Armenian-1.DG
    0.236026 S_Eskimo_Sireniki-1.DG
    0.300353 S_Karitiana-1.DG
    0.532705 S_Mbuti-1.DG
    1.0062 S_Khomani_San-1.DG

    --maf .001 --geno .1:
    0.0307083 S_Hungarian-1.DG
    0.100753 S_Armenian-1.DG
    0.236026 S_Eskimo_Sireniki-1.DG
    0.300353 S_Karitiana-1.DG
    0.532705 S_Mbuti-1.DG
    1.0062 S_Khomani_San-1.DG

    --maf .005 --geno .1:
    0.0346009 S_Hungarian-1.DG
    0.140324 S_Armenian-1.DG
    0.265385 S_Eskimo_Sireniki-1.DG
    0.3049 S_Karitiana-1.DG
    0.532409 S_Mbuti-1.DG
    1.00394 S_Khomani_San-1.DG

    --maf .01 --geno .1:
    0.0461182 S_Hungarian-1.DG
    0.246201 S_Armenian-1.DG
    0.318358 S_Eskimo_Sireniki-1.DG
    0.324301 S_Karitiana-1.DG
    0.542761 S_Mbuti-1.DG
    0.79708 S_Khomani_San-1.DG

    --maf .05 --geno .1:
    0.100773 S_Hungarian-1.DG
    0.304498 S_Armenian-1.DG
    0.429683 S_Eskimo_Sireniki-1.DG
    0.450306 S_Khomani_San-1.DG
    0.536177 S_Mbuti-1.DG
    0.632922 S_Karitiana-1.DG

    Here's the same with no `--geno` option:

    --maf .0001:
    0.030676 S_Hungarian-1.DG
    0.100718 S_Armenian-1.DG
    0.236023 S_Eskimo_Sireniki-1.DG
    0.300351 S_Karitiana-1.DG
    0.532722 S_Mbuti-1.DG
    1.0062 S_Khomani_San-1.DG

    --maf .001:
    0.030676 S_Hungarian-1.DG
    0.100718 S_Armenian-1.DG
    0.236023 S_Eskimo_Sireniki-1.DG
    0.300351 S_Karitiana-1.DG
    0.532722 S_Mbuti-1.DG
    1.0062 S_Khomani_San-1.DG

    --maf .005:
    0.0345906 S_Hungarian-1.DG
    0.14085 S_Armenian-1.DG
    0.265786 S_Eskimo_Sireniki-1.DG
    0.304951 S_Karitiana-1.DG
    0.532434 S_Mbuti-1.DG
    1.00393 S_Khomani_San-1.DG

    --maf .01:
    0.0460322 S_Hungarian-1.DG
    0.246952 S_Armenian-1.DG
    0.318423 S_Eskimo_Sireniki-1.DG
    0.324423 S_Karitiana-1.DG
    0.543871 S_Mbuti-1.DG
    0.796861 S_Khomani_San-1.DG

    --maf .05:
    0.100475 S_Hungarian-1.DG
    0.30487 S_Armenian-1.DG
    0.429481 S_Eskimo_Sireniki-1.DG
    0.450133 S_Khomani_San-1.DG
    0.53606 S_Mbuti-1.DG
    0.633111 S_Karitiana-1.DG

    BTW some of the individuals in my previous PCA were marked with `Ignore_`, like the one outlier Ju'Hoan. I probably should've not included them.

  10. #10
    Veteran Member Zoro's Avatar
    Join Date
    Dec 2017
    Last Online
    01-22-2023 @ 10:21 AM
    Meta-Ethnicity
    Indo-Iranian
    Ethnicity
    Kurd
    Ancestry
    74.31% W. Eurasian + 11.42% E. Eurasian + 5.42% S. Eurasian + 8.85% Basal Eurasian/African
    Country
    United States
    Region
    Kurdistan
    Y-DNA
    Q-M25
    mtDNA
    W4
    Gender
    Posts
    2,225
    Thumbs Up
    Received: 1,249
    Given: 524

    1 Not allowed!

    Default

    Maybe you didn’t understand what i said so I’ll repeat it’s bad idea to use —maf because that does opposite of what we want. It removes informative population specific alleles or in other words rarer alleles

    You should use —max-maf instead which removes uninformative alleles common to all populations in other words very very ancient alleles

    Try —max-maf 0.4 and —geno 0.001 and repost

    Also check your plink bim file against dbsnp database to make sure you do not have some flipped alleles because that’s pretty common with plink

    Plink bim col 5 should have alt allele and col 6 ref allele. I bet the order is wrong on some of your positions in the bim file

Page 1 of 13 1234511 ... LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. 3 Questions
    By OldSchool47 in forum Deutschland - English Entries
    Replies: 0
    Last Post: 07-26-2019, 06:56 PM
  2. 50 Questions for Men
    By Oneeye in forum Gender Issues
    Replies: 9
    Last Post: 03-20-2017, 12:50 AM
  3. Some Questions
    By FilthyLibertine in forum Anthropology
    Replies: 12
    Last Post: 11-02-2012, 05:28 PM
  4. 5 questions
    By HawkR in forum Games
    Replies: 445
    Last Post: 12-15-2011, 12:32 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •