Hi all. I programmed the impute.me module for ancestry. I think I owe a few explanations
First - it's not about "too few samples" in the database. I know other companies may use databases of hundreds of thousands of genomes from their customers, but they are all self-reported ancestry. If people self-report wrong ancestry, then that's a major bias. That's why I chose to instead base the calculation on the public samples from the 1000 genome study. Because the ancestry of those people was evaluated by anthropologists. Also, the anthropologists chose the core populations to evaluate aiming to have as diverse a mix of the world as possible. PCA should be robust enough even with a 1000 reference samples, though.
Second - yes, you probably can get much more exact read-outs elsewhere, say "you are 43.3% from country X". That nice, but it is not necessarily correct. Read this funny post
http://www.legalgenealogist.com/2017...till-not-soup/ about a user that checked her genome with many different engines. And got wildly different results. It’s simply just over-reporting to give results more fine-grained. That’s why you are shown it like this – with the closest ancestries. Then you can conclude on the distance yourself, and also conclude on the likely accuracy given the position of other related ethnicities.
Bookmarks