PDA

View Full Version : Genes and Languages in the Caucasus



Franz
05-15-2011, 08:38 PM
http://dienekes.blogspot.com/2011/05/genes-and-languages-in-caucasus.html

UPDATE I (Genealogical rate, Gene-language concordance, Ossetes): I seriously don't know where to begin with this paper. So, given the serendipitous appearance of an abstract (http://dienekes.blogspot.com/2011/05/let-y-str-mutation-wars-begin.html) on Y-chromosome mutation rates, here is a major new pro-genealogical rate quote from the new paper:
We found that “evolutionary” estimates of most clusters fall far outside the range of the respective linguistic dates, while “genealogical” estimates gave a good fit with the linguistic 23 dates. At least two population events in the Caucasus are documented archaeologically, which allows additional comparison with these “historical” dates. In both cases, the historical (archaeological) date is similar to a genetic estimate based on the “genealogical” mutation rate (Supplementary Note 2).
And, here's a comparison of the linguistic and genetic (based on Y-chromosomes) trees from the paper:

http://4.bp.blogspot.com/-ggmofUVKqtU/Tc8Ti5gbddI/AAAAAAAADuM/9Ik50PCfnFE/s1600/trees.png

The correspondence seems remarkable; the only major discrepancy is for Iranic (Indo-European) Ossetes (http://en.wikipedia.org/wiki/Ossetians) who group with NW Caucasians genetically, which makes sense as the Ossetes are probably to a large extent NW Caucasians that underwent a language shift at the influence of the Alans (http://en.wikipedia.org/wiki/Alans).


Speaking of the Ossetes, their negligible R1a1-M198 frequency (0.4-0.8%) should be a warning that Iranic steppe nomads _does not equal_ R1a1. While a limited contribution of Alans to the Ossetes is expected, it is not expected that Ossetes will have two of the lowest M198 frequencies in the Caucassus: in all probability R1a1 was not particularly important among Alans, and, by implication (?) Sarmatians.


UPDATE II (4 haplogroups for 4 language families):

The most interesting discovery in this paper is, of course, the correspondence between Y-chromosome haplogroups and language groups, thanks to the very large number of individuals tested and the deep phylogenetic resolution of the haplogroups:
Overall, the most frequent haplogroups in the Caucasus were G2a3b1-P303 (12%), G2a1a-P18 (8%), J1*-M267(xP58) (34%), and J2a4b*-M67(xM92) (21%), which together encompassed 73% of the Y chromosomes, while the other 24 haplogroups identified in our study comprise the remaining 27% (Table 2). ... haplogroup G2a3b1-P303 comprised at least 21% (and up to 86%) of the Y chromosomes in the Shapsug, Abkhaz and Circassians ... haplogroup G2a1a-P18 comprised at least 56% (and up to 73%) of the Digorians and Ironians (both from the Central Caucasus Iranic linguistic group), while not being found at more than 12% (average 3%) in other populations... haplogroup J2a4b*-M67(xM92) comprised 51-79% of the Y chromosomes in the Ingush and three Chechen populations (North-East Caucasus, Nakh linguistic group), while, in the rest of the Caucasus, its frequency was not higher than 9% (average 3%) ... haplogroup J1*-M267(xP58) comprised 44-99% of the Avar, Dargins, Kaitak, Kubachi, and Lezghins (South-East Caucasus, Dagestan linguistic group) but was less than 25% in Nakh populations and less than 5% in the rest of Caucasus.
Interestingly, G2a3 is one of the lineages of early Central European farmers (http://dienekes.blogspot.com/2010/11/near-eastern-origin-of-european.html), and 2 medieval German knights (http://dienekes.blogspot.com/2009/06/y-chromosomes-from-7th-c-ergolding.html). G2 is also, curiously, one of the West Eurasian lineages that are found in very small quantities in India, especially among upper caste Hindus. We are beginning to make connections across space and time, even though the patterns are far from clear yet.


The prevalence of J1*-M267(xP58) in Dagestan is well known (or suspected) from previous studies. Notice that J-P58 (http://dienekes.blogspot.com/2009/10/emergence-and-dispersal-of-haplogroup-j.html), if we use the genealogical rate has an age of ~5.4ky in Semitic groups, and this is in concordance with the 5,750 years ago (http://dienekes.blogspot.com/2009/08/bronze-age-origin-of-semitic-languages.html) origin of Semitic languages based on Bayesian phylogenetics. So, it is clear that part of haplogroup J1 was prevalent in ancient Semitic groups, another, disjoint part in ancient Dagestani groups.


To make things more interesting, the Nakh groups (Ingush and Chechens) have J2a4b*-M67(xM92) as their modal haplogroup. Nakh is also a Northeast Caucasian language subfamily, like Dagestani, and indeed NE Caucasian is also called Nakho-Daghestanian (http://en.wikipedia.org/wiki/Northeast_Caucasian_languages). What did the early speakers of this family look like?


It would be tempting to think that Proto-Nakho-Dagestanians were J1-dominated, as J1 exists in both Nakh (16-25%) and Dagestani (58-99%) groups, whereas J2a4b-M67 (the Nakh modal haplogroup) is nearly completely absent in Dagestanians.


UPDATE III (No European influence):

Another interesting discovery of this study is the lack of European influence in the populations of the North Caucasus.

http://1.bp.blogspot.com/-V700ZneaFEo/Tc-qAunJh8I/AAAAAAAADuU/l52gW61JtqI/s1600/don_river.png


It seems that both R1a1a-M198 and I2a-P37 have a major barrier eastward in the Don river. Please note that the former is not strictly a European haplogroup, but it nonetheless experiences a massive drop in frequency, and is negligible everywhere except in Abkhaz-Circassians (NW Caucasus; 10.3-19.7%), with an outlier in Dargins (22%).


This seems to put a limit on the origin of any hypothetical movements across the Eurasian steppe east of the Don river, as haplogroup I2a-P37 is largely absent in Central Asia, and occurs 3 times in 1,525 individuals in this sample. So, while there have been proposals of a Central European origin of some steppe pastoralist groups, these are hard to reconcile with this picture.


UPDATE IV (Haplogroup G):

Two of the modal haplogroups in this paper are G2a1a-P18 (Iranic, 56-73%) and G2a3b1-P303 (NW Caucasians, 21-86%). Battaglia et al. (2008) (http://www.nature.com/ejhg/journal/v17/n6/abs/ejhg2008249a.html) also found a high frequency of G2a* in Georgians and Balkars (~30%, also modal in both populations). It appears that G2a is a mainly West (both NW and SW) Caucasian phenomenon within the context of this region.


UPDATE V (Starostin and Language depth)


The authors applied the methodology of the late Sergei Starostin (http://en.wikipedia.org/wiki/Sergei_Anatolyevich_Starostin) to the problem of language time depth:
The present work employs Starostin’s methodology, and we made special efforts to create the high-quality linguistic databases required for this analysis. Thus, based on significantly extended and revised linguistic databases, we have applied a glotto-chronological approach to the North Caucasian languages. As a result, our study provides a unique opportunity to make direct comparisons of linguistic and genetic data from the same populations. Lexico-statistical methods have also been applied to a number of language families using a Bayesian approach to increase the statistical robustness of language classification (Gray and Atkinson, 2003; Kitchen et al., 2009; Greenhill et al., 2010). Using these methods with the Caucasus languages under
study here will be the focus of future work.
It will certainly be interesting to see Bayesian phylogenetic methods (http://dienekes.blogspot.com/2011/04/indo-european-origins-neolithic.html) applied to the Caucasus languages in the future, using the linguistic datasets developed here. The concordance of genetic-linguistic results in this paper, in addition to the many successes of the G&A approach, is making it increasingly difficult for those who doubt our ability to estimate the age of language families in a manner similar to that with which biologists estimate the age of genetic variation.


See also Tower of Babel (http://starling.rinet.ru/main.html) project and the Evolution of Human Languages (http://ehl.santafe.edu/main.html) project at the Santa Fe Institute.


UPDATE VI (Haplogroup J2a)

I have recently speculated about a possible link between the Caucasus region and India based on the appearance of a "Dagestan" component (http://dienekes.blogspot.com/2010/12/solution-to-problem-of-indo-aryan.html) in India, the clear West (http://dienekes.blogspot.com/2011/05/solution-to-problem-of-indo-aryan.html) Asian (http://dienekes.blogspot.com/2011/05/beware-of-sample-sizes-why-ancestral.html) origin of Ancestral North Indians, as well as a possible linguistic link between Northeast Caucasian, Hurrian, and Indo-European.


A problem with that theory is that the high J1*(xP58) frequency in Dagestan has no counterpart in South Asia. The current study, however, adds data on the Nakh part of the Nakho-Dagestanian (Northeast Caucasian) family, showing this to be J2a4b-M67 dominated. So, while I think that J1*(xP58) may have been present among Proto-Northeast Caucasians, these must have interacted with J2a folk.


J-M67 is clearly intrusive into the Central Caucasus, from the South where a much greater variety of J2a-related lineages is observed among Armenians (http://www.familytreedna.com/public/ArmeniaDNAProject/default.aspx?section=results), North Iranians (http://dienekes.blogspot.com/2006/06/y-chromosomes-of-iran.html), and Anatolian Turks (http://hpgl.stanford.edu/publications/HG_2004_v114_p127-148.pdf).


We now have good coverage of J2a in the entirety of the West Asian region, with the exception of Azerbaijan, and a few patterns are beginning to emerge:


The center of the J2a world is somewhere between eastern Turkey, Armenia, Azerbaijan, Iran, and Syria
The Caucasus is a northern extension of this world, just as Greece and Italy are its main western extensions, with a strong extension into Central Asia as far as Xinjiang, and well into South Asia all the way to upper caste South Indian Hindus.
In the Caucasus itself J-M67 is dominating Nakh speakers, but with little other J2a related variation.
In comparison to Nakhs, J2a seems more varied in Georgians (http://www.genos.hr/data/2008_European_Journal_of_Human_Genetics_Battaglia_ Fornarino_Y-chromosomal_evidence.pdf), among Ossetes, and among NW Caucasian speakers


It is hard to make any pronouncements on how J2a spread northwards from its Transcaucasian cradle, but I would think that the Kura-Araxes and Maikop cultures are fairly good candidates for that spread, with the former being J2a dominated, and the latter being more G2a dominated. I would not, however, dismiss a more recent spread of J2a into the region.


UPDATE VII (Absence of E1b1b1):

This haplogroup has a more Mediterranean distribution and is conspicuously absent in the North Caucasus. Unfortunately no downstream markers were typed, but (a) its presence in small amounts in NW Caucasians (1-1.7%) together with a similar low frequency (1.5%) in Georgians, (b) its absolute absence among Nakho-Dagestanians, except for one Lezghin, suggest to me that it arrived to the region from the west, and is probably a low-frequency trace of Ancient Greek colonies of the Black Sea, just as it is associated with Greek colonists in the West Mediterranean (http://dienekes.blogspot.com/2011/03/coming-of-greeks-to-provence-and.html) and Sicily (http://dienekes.blogspot.com/2008/08/sicilian-y-chromosomes-greek-and-north.html).


UPDATE VIII (Haplogroups L and T):

There is a little haplogroup L in the North Caucasus. L-M27 and L-M317 seems concentrated in the Northwest, while L-M357 is found only in Nakh speakers. The detection of L-M357 in North but not South Iran (http://dienekes.blogspot.com/2006/06/y-chromosomes-of-iran.html) may be related with this population, and also the L-rich population of Syria (http://www.ncbi.nlm.nih.gov/pubmed/19686289), especially from the eastern inland area.


Haplogroup T has been the subject of a major recent paper (http://dienekes.blogspot.com/2011/04/new-paper-on-y-chromosome-haplogroup-t.html). In this region, it is found in 2 NW Caucasians, 1 Ossete and a couple of Lezgins, but unfortunately with no fine phylogenetic resolution.


Mol Biol Evol (2011) doi: 10.1093/molbev/msr126

Parallel Evolution of Genes and Languages in the Caucasus Region

Oleg Balanovsky1,2,*, Khadizhat Dibirova1,*, Anna Dybo3, Oleg Mudrak4, Svetlana Frolova1, Elvira Pocheshkhova5, Marc Haber6, Daniel Platt7, Theodore Schurr8, Wolfgang Haak9, Marina Kuznetsova1, Magomed Radzhabov1, Olga Balaganskaya1,2, Alexey Romanov1, Tatiana Zakharova1, David F. Soria Hernanz10,11, Pierre Zalloua6, Sergey Koshel12, Merritt Ruhlen13, Colin Renfrew14, R. Spencer Wells10, Chris Tyler-Smith15, Elena Balanovska1 and The Genographic Consortium16

We analyzed 40 SNP and 19 STR Y-chromosomal markers in a large sample of 1,525 indigenous individuals from 14 populations in the Caucasus and 254 additional individuals representing potential source populations. We also employed a lexicostatistical approach to reconstruct the history of the languages of the North Caucasian family spoken by the Caucasus populations. We found a different major haplogroup to be prevalent in each of four sets of populations that occupy distinct geographic regions and belong to different linguistic branches. The haplogroup frequencies correlated with geography and, even more strongly, with language. Within haplogroups, a number of haplotype clusters were shown to be specific to individual populations and languages. The data suggested a direct origin of Caucasus male lineages from the Near East, followed by high levels of isolation, differentiation and genetic drift in situ. Comparison of genetic and linguistic reconstructions covering the last few millennia showed striking correspondences between the topology and dates of the respective gene and language trees, and with documented historical events. Overall, in the Caucasus region, unmatched levels of gene-language co-evolution occurred within geographically isolated populations, probably due to its mountainous terrain.

Link (http://mbe.oxfordjournals.org/content/early/2011/05/13/molbev.msr126.short?rss=1)

Franz
05-18-2011, 08:32 PM
http://i.imgur.com/hE9yp.pnghttp://i.imgur.com/3IlUs.jpg
http://i.imgur.com/9wpSh.jpg