1
The datasets from Lazaridis et al. 2014 and 2016 can be downloaded from here: https://reich.hms.harvard.edu/datasets. 1240K+HO has a sample with the same ID for all 2068 samples in the Lazaridis 2016 dataset, and for 1945 out of 1963 of the samples in the Lazaridis 2014 dataset (but the remaining 18 samples consist of some standard ancient samples, a chimp reference genome, and 3 Australian samples that have a different ID). Almost all of the samples in Lazaridis 2016 are also in Lazaridis 2014.
Code:$ curl https://reichdata.hms.harvard.edu/pub/datasets/amh_repo/curated_releases/V44/V44.3/SHARE/public.dir/v44.3_HO_public.anno -Lso ho.anno $ wc -l NearEastPublic/HumanOriginsPublic2068.ind # Lazaridis 2016 2068 $ awk '{print$1}' NearEastPublic/HumanOriginsPublic2068.ind|awk -F\\t 'NR==FNR{a[$0];next}$2 in a{print$4}' - ho.anno|sort|uniq -c|sort 5 PickrellNatureCommunications2012 127 LazaridisNature2016 881 PattersonGenetics2012 1055 LazaridisNature2014 $ wc -l EuropeFullyPublic/data.ind # Lazaridis 2014 1963 $ awk '{print$1}' EuropeFullyPublic/data.ind|awk -F\\t 'NR==FNR{a[$0];next}$2 in a{print$4}' - ho.anno|sort|uniq -c|sort 1 Genome 5 PickrellNatureCommunications2012 887 PattersonGenetics2012 1052 LazaridisNature2014 $ awk 'NR==FNR{a[$1];next}!($1 in a)' EuropeFullyPublic/data.ind NearEastPublic/HumanOriginsPublic2068.ind|wc -l 130
Bookmarks