Unravelling the genetic story of North Eurasia, a largely ignored part of the world by population geneticists, is the main reason I started my BGA project. That's because understanding the genetic substructures within this zone is pivotal to understanding the makeup of my own "Balto-Slavic" genome. I'm now happy to report that in the last few months I've collected enough samples, from Northwestern Europe to the Bering Strait, to be able to run at least some meaningful analyses on the subject, so here goes...
The comparison below is, as far as I know, the first time ever that a decent number of Baltic Finns, from all over Finland, have been publicly analyzed alongside samples from across North Eurasia using high-density genome-wide markers. What you're seeing is the output from a model-based algorithm called ADMIXTURE, with eight ancestral populations assumed (K=8), based on 320,000 SNPs (spreadsheet available here). And yes, I'm as surprised as anyone that it took this long to do it, and that it's me doing it.
http://img209.imageshack.us/img209/8954/neura8j.png
Now, before you get all worked up about the results, it's vital to understand that ADMIXTURE doesn't actually show admixture. It doesn't even show any ancestral components. All it does is tell us the chances of each of the samples in the comparison being mixed up for the others. It does this by working out the maximum likelihood of the samples sharing membership in clusters, with the number of clusters picked by the user (me, in this case). From this information it's possible to infer admixture. So yeah, it can be a bit fickle, arbitrary and "noisy", but it's still fairly useful, especially when the results are cross checked with those from other types of analyses, like MDS and IBD.
The most striking part about the plot above, for us Eruos anyway, is probably the so called Fenno-Scandic (cream colored) cluster. I tagged it "Fenno-Scandic" because it peaks in the Baltic Finnish samples, but the name isn't important. Interestingly, it shows up in hefty amounts all the way from the British Isles to the Volga-Ural region (note its presence in the Chuvash samples). But it drops like a rock as one moves towards Central Europe. So does this mean there was a Baltic Finnish invasion of Ireland on one hand, the Volga-Ural on the other? Nope, it doesn't.
That cluster is there because I clearly oversampled the Baltic Finns, letting the algorithm easily deduce their relatively unique characteristics, and in large part making them set the tone of the analysis for the other European samples. It's actually made up of several quite different ancestral components carried by Baltic Finns; Northwestern European, Northeastern European and Uralic. So if we go back to what ADMIXTURE really shows (ie. the probability of the samples being mixed up for one another) everything makes sense. Indeed, a recent analysis done by the Institute for Molecular Medicine Finland (FIMM) basically showed the same thing:
Genetically, Finns have more in common with, for example, the Dutch or Russians living in the area of Murom, to the east of Moscow, than with our linguistic relations, the Hungarians; genetic closeness clearly follows geographic distance more closely than linguistic distance.
Also, note how on my plot above some of the Finns come out very differently from their kinfolk, ie. less Finnish and more North/Central European. This too was picked up by the FIMM.
Owing to our settlement history, the genetic differences among Finns are great on both the east/west and north/south axes; the greater the geographic distance is, the greater the genetic differences are. In comparing the Finnish dialect areas, the greatest genetic differences are found between Finns of Southwest Finland and inhabitants of Kuusamo in Northeast Finland.
The linguistic link between Swedish-speaking Finns living in coastal areas and Swedes is also reflected in the greater genetic closeness of these two groups in comparison with Finnish speakers.
Like I say above, the large number of Baltic Finns makes it easy for the algorithm to come up with a Baltic Finnish specific cluster, so only some of them end up scoring clear North, Central and/or East Asian admixtures. The rest of their Eastern ancestry is contained within the Fenno-Scandic component. This shows in the Fst (genetic) distances between the eight clusters:
http://img507.imageshack.us/img507/6529/neurafst.png
The Fenno-Scandic cluster is the European cluster closest to the Asian clusters. Also, check out the presence of what I call the "North Eurasian" cluster, which is about the same distance from Europe as it is from East Asia. This so called component peaks in the Selkup and Ket samples, and is marked blue on the above ADMIXTURE plot. It also shows up all over Northern Europe. So this might well be a sign of ancient genetic continuity across North Eurasia, at least to some degree anyway.
An MDS analysis of the same samples, using the same 320,000 markers, produces very similar results. The Baltic Finns stream towards North Eurasia from the North/Northwest European cluster, overlapping with the Russians and some of the North Russians (from Vologda Oblast, just east of Finland). At the same time, the Selkups and Ket position themselves roughly between the far Northeast Eurasians and Europeans. Indeed, you can see a hint of a stream flowing across the north, with the Europeans seemingly more attracted towards the lower left of the plot than to the upper left, where Turko-Mongols and East Asians proper reside.
http://img255.imageshack.us/img255/6818/neura12.png
I've also run some preliminary intra-North Eurasian Identity by Descent (IBD) comparisons. I won't post any details because my methodology is still a bit unorthodox (ie. I'm using PLINK, which tends to overestimate IBD sharing when non-homogeneous sample sets are being compared). However, it's definitely worth a mention because it shows that, even at very low thresholds, Baltic Finns don't share IBD segments with Asians, while Selkups don't share any with Europeans (except one Selkup, who also shows European, possibly Russian, influence on the above ADMIXTURE plot). So it certainly seems, as far as I can tell, that the aforementioned Selkup "North Eurasian" cluster, phylogenetically about halfway between Northeast Asia and Europe, is pretty damn old. Similarly, the North/East Eurasian affinity of the Baltic Finns is also looking like an ancient phenomenon. On the other hand, I am seeing some hits between Anatolian Turks, Uzbeks, Uygurs, Hazara and Pathans. We'll see how that goes as I try different types of software and thresholds.
http://img255.imageshack.us/img255/1281/selkupman.jpg
Above is a Selkup man, courtesy of Wikipedia. He reminds me a bit of Charles Bronson, who was a Lithuanian Tatar by descent. But anyway, by posting his photo, I'm not suggesting that those who show around 1-2% membership in the Selkup "North Eurasian" cluster in the spreadsheet above, including many Northern Europeans, had an ancestor like this at some point in time. Perhaps, but there's no way to tell at the moment, because all we're looking at via the stats above is allele sharing and various probabilities stemming from it. It's all too easy to get the same results because of very different reasons, and in the end, it usually comes down to a best fit scenario based on the available resources (for example, reference samples).