Quote:
Each simulation contains an average of ~30 million SNPs. In order to understand the performance of qpAdm with less data, we randomly down-sample the complete dataset to produce analysis datasets of 1 million, 100 thousand, and 10 thousand sites. In all cases, the average admixture proportion estimate generated is extremely close to the simulated α, although we do observe an increase in the amount of variance in the individual estimates as the amount of data analyzed decreases (Figure 3A; Supplementary Table 3). In order to increase computational efficiency and to better approximate typical analysis datasets, all subsequent analyses are performed on the data that has been randomly down-sampled to 1 million sites. We observe similar results when using non-random ascertainment schemes to select sites for analysis (Supplementary Table 4).
The impact of non-random ascertainment schemes on qpAdm analyses are described in more detail in a later section.
We find that qpAdm is robust to missing data, where data from randomly selected sites in each individual is considered missing with rate 10%, 25%, 50%, 75% or 90%
I generally prefer stick with diploid genomes but pseudo-haploidity has little effect on qpAdm too.