Kamal900
01-15-2015, 12:14 PM
Jason A. Hodgson, Connie J. Mulligan, Ali Al-Meeri, Ryan L. Raaum. Early Back-to-Africa Migration into the Horn of Africa. PLoS Genetics, June 12, 2014.
Genetic studies have identified substantial non-African admixture in the Horn of Africa (HOA). In the most recent genomic
studies, this non-African ancestry has been attributed to admixture with Middle Eastern populations during the last few
thousand years. However, mitochondrial and Y chromosome data are suggestive of earlier episodes of admixture. To
investigate this further, we generated new genome-wide SNP data for a Yemeni population sample and merged these new
data with published genome-wide genetic data from the HOA and a broad selection of surrounding populations. We used
multidimensional scaling and ADMIXTURE methods in an exploratory data analysis to develop hypotheses on admixture
and population structure in HOA populations. These analyses suggested that there might be distinct, differentiated African
and non-African ancestries in the HOA. After partitioning the SNP data into African and non-African origin chromosome
segments, we found support for a distinct African (Ethiopic) ancestry and a distinct non-African (Ethio-Somali) ancestry in
HOA populations. The African Ethiopic ancestry is tightly restricted to HOA populations and likely represents an
autochthonous HOA population. The non-African ancestry in the HOA, which is primarily attributed to a novel Ethio-Somali
inferred ancestry component, is significantly differentiated from all neighboring non-African ancestries in North Africa, the
Levant, and Arabia. The Ethio-Somali ancestry is found in all admixed HOA ethnic groups, shows little inter-individual
variance within these ethnic groups, is estimated to have diverged from all other non-African ancestries by at least 23 ka,
and does not carry the unique Arabian lactase persistence allele that arose about 4 ka.
Non-African ancestry in the HOA
The ADMIXTURE-derived hypothesis that non-African ancestry in the HOA derives from admixture with a population or populations with high levels of the Arabian and Maghrebi IACs and some of the Eurasian IAC (hypothesis 2A above) suggests that HOA populations should have higher levels of
shared gene identity with populations with higher proportions of those ancestries. To evaluate this prediction, we examined the relationship between shared gene identity and the ADMIXTURE-estimated proportion of the Arabian, Eur-asian, and Maghrebi IACs in MENA population samples for each of the non-African ancestry partitions of the admixed HOA populations using varying intercepts linear models. Only the Maghrebi IAC analysis shows the expected relationship: shared gene identity between HOA and MENA populations increases as the proportion of Maghrebi ancestry increases(Figure 4A).
Figure 4A:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_002-1.png
Contrary to expectations, shared gene identity decreases between HOA populations and MENA populations as the proportion of the Arabian IAC (Figure 4B) and the Eurasian IAC (Figure 4C) increases.
Figure 4B:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_003-1.png
Figure 4C:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_004.png
Next, we looked for evidence for extended inter-population gene flow in the correlation of geographic distance and shared gene identity. We found no relationship between geographic and genetic distance within either HOA or MENA populations. We then examined this relationship for HOA populations to North African (Egypt, Mozabite), Levantine (Bedouin, Druze, Palestinian), and Arabian (Saudi Arabia, Yemen)
populations (Figure S3).
For North Africa and Arabia, we calculated both straight-line distances and distances involving a waypoint through Egypt. The only group for which there is a clear gradient of genetic similarity decreasing with geographic distance is for the straight-line distances with Arabian populations (Mantel test, r = 2 0.74, p = 0.0033) (Figure 5A).
Figure 5A:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_005.png
This relationship between genetic and geographic distance between HOA and Arabian populations might support a hypothesis of long-term equilibrium gene flow among these populations in an isolation-by-distance model. However, if this hypothesis were true, we would expect the highest levels of pairwise gene identity to be between HOA and Arabian populations, but this is not the case. The highest levels of shared gene identity are between HOA populations and the Levantine Palestinian and the North African Mozabite population samples (Figure 5B).
Figure 5B:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_006.png
Thus, it is more likely that the genetic-geographic HOA-Arabia distance gradient reflects secondary admixture of Arabian migrants into HOA populations already carrying substantial non-African ancestry or already admixed HOA populations sending migrants into Arabian populations.
This observation suggests a geographical structuring between the Amhara, Tygray, Oromo, and Afar in the Ethiopian highlands, the Somali in eastern Ethiopia and the Somalia lowlands, and the Ari in the southwestern Ethiopian Rift. AMOVA of these three population groups reveals significant between group differentiation (WGT= 0.017, p, 0.0001). In addition, the population tree with these geographic subgroups (Figure 5C) is a significantly better fit to the data than the tree without subgroups (K = 126, df = 1, p < 0).
Within MENA populations, linguistic subgroups cannot be defined, so we tested several historic/geographic groupings. Between population differ-
entiation was maximized in the AMOVA analysis with three subgroups: the northwest African Mozabite; the ethnic and religious isolate Druze; and the populations with histories entwined with the development and expansion of Islam - the Egyptians, Palestinians, Bedouin, Saudi Arabians, and Yemeni. For this set of subgroups, between population differentiation was statistically significant (W GT = 0.011, p , 0.0001) and the popula-
tion tree with these subgroups (Figure 5D) is a significantly better fit to the data than the tree without subgroups (K = 67, df = 1, p = 3.3 6 10 2 16).
Figure 5C & D:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_007.png
Figure 5E:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_008.png
Finally, putting all of the populations together in an AMOVA analysis, we find significant differences between HOA and MENA subgroups at both a macro level (HOA vs MENA) (W GT = 0.014, p ,0.0001) and a micro level (all of the individual subgroups identified above) (W GT = 0.016, p ,0.0001).
Relationship to the North African back-to-Africa
migration
Like the Ethio-Somali, the Maghrebi IAC in North African populations derives from a early back-to-Africa migration [34,43,61,99102]. Studies of North African populations reveal a complex layered history of admixture in North Africa, with an inferred pre-Last Glacial Maximum settlement of North Africa by a non-African population followed by gene flow from European, Middle Eastern, and sub-Saharan African populations dating from
the end of the LGM to the recent past [43,103105].
A single prehistoric migration of both the Maghrebi and the Ethio-Somali back into Africa is the most parsimonious hypothesis. That is, a common ancestral population migrated into northeast Africa through the Sinai and then split into two, with one branch continuing west across North Africa and the other heading south into the HOA. For the Ethio-Somali, the lowest FST value from the ADMIXTURE estimated ancestral allele frequencies is with the Maghrebi (Text S1), which is consistent with a common origin hypothesis. In contrast, the Maghrebi component has lower FST values with Arabian, European, and Eurasian ancestral populations than with the Ethio-Somali, which suggests that the Maghrebi diverged most recently from those populations, and might indicate separate back-to-Africa migrations for the Ethio-Somali and the Maghrebi. Unfortunately, the FST estimates alone are not robust enough to distinguish between single or separate
back-to-Africa migrations.
A later migration of a subset of this population back to the Levant before 6 ka would account for a Levantine origin of the Semitic languages [18] and the relatively even distribution of around 7% Ethio-Somali ancestry in all sampled Levantine populations (Table S6). Later migration from Arabia into the HOA beginning around 3 ka would explain the origin of the Ethiosemitic languages at this time [18], the presence of greater Arabian and Eurasian ancestry in the Semitic speaking populations of the HOA (Table 2, S6), and ROLLOFF/ALDER estimates of admixture in HOA populations between 15 ka (Table 1).
Summary and implications
We find that most of the non-African ancestry in the HOA can be assigned to a distinct non-African origin Ethio-Somali ancestry component, which is found at its highest frequencies in Cushitic and Semitic speaking HOA populations (Table 2, Figure 2). In addition to verifying that most HOA populations have substantial non-African ancestry, which is not controversial [1114,16], we argue that the non-African origin Ethio-Somali ancestry in the HOA is most likely pre-agricultural. In combination with the genomic evidence for a pre-agricultural back-to-Africa migration into North Africa [43,61] and inference of pre-agricultural migrations in and out-of-Africa from mitochondrial and Y chromosome data [13,3237,47,99102], these results contribute to a growing body of evidence for migrations of human populations in and out of Africa throughout prehistory [57] and suggests that human hunter-gatherer populations were much more dynamic than commonly assumed.
We close with a provisional linguistic hypothesis. The proto- Afro-Asiatic speakers are thought to have lived either in the area of the Levant or in east/northeast Africa [8,107,108]. Proponents of the Levantine origin of Afro-Asiatic tie the dispersal and differentiation of this language group to the development of agriculture in the Levant beginning around 12 ka [8,109,110]. In the African-origins model, the original diversification of the Afro-Asiatic languages is pre-agricultural, with the source population living in the central Nile valley, the African Red Sea hills, or the HOA [108,111]. In this model, later diversification and expansion within particular Afro-Asiatic language groups may be associated with agricultural expansions and transmissions, but the deep population representation, which allows for greater overlap of mutually typed SNPs across studies. The 90K dataset includes data for 91,101 SNPs from HOA, HapMap3, HGDP, and North Africa populations. The 260K dataset includes data for 259,257 SNPs from the HOA, HapMap3, HGDP, southern Africa, and selected West Asian populations (see Table S1 for populations in
the 90K and 260K datasets).
More --> http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjour nal.pgen.1004393&representation=PDF
Genetic studies have identified substantial non-African admixture in the Horn of Africa (HOA). In the most recent genomic
studies, this non-African ancestry has been attributed to admixture with Middle Eastern populations during the last few
thousand years. However, mitochondrial and Y chromosome data are suggestive of earlier episodes of admixture. To
investigate this further, we generated new genome-wide SNP data for a Yemeni population sample and merged these new
data with published genome-wide genetic data from the HOA and a broad selection of surrounding populations. We used
multidimensional scaling and ADMIXTURE methods in an exploratory data analysis to develop hypotheses on admixture
and population structure in HOA populations. These analyses suggested that there might be distinct, differentiated African
and non-African ancestries in the HOA. After partitioning the SNP data into African and non-African origin chromosome
segments, we found support for a distinct African (Ethiopic) ancestry and a distinct non-African (Ethio-Somali) ancestry in
HOA populations. The African Ethiopic ancestry is tightly restricted to HOA populations and likely represents an
autochthonous HOA population. The non-African ancestry in the HOA, which is primarily attributed to a novel Ethio-Somali
inferred ancestry component, is significantly differentiated from all neighboring non-African ancestries in North Africa, the
Levant, and Arabia. The Ethio-Somali ancestry is found in all admixed HOA ethnic groups, shows little inter-individual
variance within these ethnic groups, is estimated to have diverged from all other non-African ancestries by at least 23 ka,
and does not carry the unique Arabian lactase persistence allele that arose about 4 ka.
Non-African ancestry in the HOA
The ADMIXTURE-derived hypothesis that non-African ancestry in the HOA derives from admixture with a population or populations with high levels of the Arabian and Maghrebi IACs and some of the Eurasian IAC (hypothesis 2A above) suggests that HOA populations should have higher levels of
shared gene identity with populations with higher proportions of those ancestries. To evaluate this prediction, we examined the relationship between shared gene identity and the ADMIXTURE-estimated proportion of the Arabian, Eur-asian, and Maghrebi IACs in MENA population samples for each of the non-African ancestry partitions of the admixed HOA populations using varying intercepts linear models. Only the Maghrebi IAC analysis shows the expected relationship: shared gene identity between HOA and MENA populations increases as the proportion of Maghrebi ancestry increases(Figure 4A).
Figure 4A:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_002-1.png
Contrary to expectations, shared gene identity decreases between HOA populations and MENA populations as the proportion of the Arabian IAC (Figure 4B) and the Eurasian IAC (Figure 4C) increases.
Figure 4B:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_003-1.png
Figure 4C:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_004.png
Next, we looked for evidence for extended inter-population gene flow in the correlation of geographic distance and shared gene identity. We found no relationship between geographic and genetic distance within either HOA or MENA populations. We then examined this relationship for HOA populations to North African (Egypt, Mozabite), Levantine (Bedouin, Druze, Palestinian), and Arabian (Saudi Arabia, Yemen)
populations (Figure S3).
For North Africa and Arabia, we calculated both straight-line distances and distances involving a waypoint through Egypt. The only group for which there is a clear gradient of genetic similarity decreasing with geographic distance is for the straight-line distances with Arabian populations (Mantel test, r = 2 0.74, p = 0.0033) (Figure 5A).
Figure 5A:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_005.png
This relationship between genetic and geographic distance between HOA and Arabian populations might support a hypothesis of long-term equilibrium gene flow among these populations in an isolation-by-distance model. However, if this hypothesis were true, we would expect the highest levels of pairwise gene identity to be between HOA and Arabian populations, but this is not the case. The highest levels of shared gene identity are between HOA populations and the Levantine Palestinian and the North African Mozabite population samples (Figure 5B).
Figure 5B:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_006.png
Thus, it is more likely that the genetic-geographic HOA-Arabia distance gradient reflects secondary admixture of Arabian migrants into HOA populations already carrying substantial non-African ancestry or already admixed HOA populations sending migrants into Arabian populations.
This observation suggests a geographical structuring between the Amhara, Tygray, Oromo, and Afar in the Ethiopian highlands, the Somali in eastern Ethiopia and the Somalia lowlands, and the Ari in the southwestern Ethiopian Rift. AMOVA of these three population groups reveals significant between group differentiation (WGT= 0.017, p, 0.0001). In addition, the population tree with these geographic subgroups (Figure 5C) is a significantly better fit to the data than the tree without subgroups (K = 126, df = 1, p < 0).
Within MENA populations, linguistic subgroups cannot be defined, so we tested several historic/geographic groupings. Between population differ-
entiation was maximized in the AMOVA analysis with three subgroups: the northwest African Mozabite; the ethnic and religious isolate Druze; and the populations with histories entwined with the development and expansion of Islam - the Egyptians, Palestinians, Bedouin, Saudi Arabians, and Yemeni. For this set of subgroups, between population differentiation was statistically significant (W GT = 0.011, p , 0.0001) and the popula-
tion tree with these subgroups (Figure 5D) is a significantly better fit to the data than the tree without subgroups (K = 67, df = 1, p = 3.3 6 10 2 16).
Figure 5C & D:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_007.png
Figure 5E:
http://i155.photobucket.com/albums/s310/Kamal900/CpWz_008.png
Finally, putting all of the populations together in an AMOVA analysis, we find significant differences between HOA and MENA subgroups at both a macro level (HOA vs MENA) (W GT = 0.014, p ,0.0001) and a micro level (all of the individual subgroups identified above) (W GT = 0.016, p ,0.0001).
Relationship to the North African back-to-Africa
migration
Like the Ethio-Somali, the Maghrebi IAC in North African populations derives from a early back-to-Africa migration [34,43,61,99102]. Studies of North African populations reveal a complex layered history of admixture in North Africa, with an inferred pre-Last Glacial Maximum settlement of North Africa by a non-African population followed by gene flow from European, Middle Eastern, and sub-Saharan African populations dating from
the end of the LGM to the recent past [43,103105].
A single prehistoric migration of both the Maghrebi and the Ethio-Somali back into Africa is the most parsimonious hypothesis. That is, a common ancestral population migrated into northeast Africa through the Sinai and then split into two, with one branch continuing west across North Africa and the other heading south into the HOA. For the Ethio-Somali, the lowest FST value from the ADMIXTURE estimated ancestral allele frequencies is with the Maghrebi (Text S1), which is consistent with a common origin hypothesis. In contrast, the Maghrebi component has lower FST values with Arabian, European, and Eurasian ancestral populations than with the Ethio-Somali, which suggests that the Maghrebi diverged most recently from those populations, and might indicate separate back-to-Africa migrations for the Ethio-Somali and the Maghrebi. Unfortunately, the FST estimates alone are not robust enough to distinguish between single or separate
back-to-Africa migrations.
A later migration of a subset of this population back to the Levant before 6 ka would account for a Levantine origin of the Semitic languages [18] and the relatively even distribution of around 7% Ethio-Somali ancestry in all sampled Levantine populations (Table S6). Later migration from Arabia into the HOA beginning around 3 ka would explain the origin of the Ethiosemitic languages at this time [18], the presence of greater Arabian and Eurasian ancestry in the Semitic speaking populations of the HOA (Table 2, S6), and ROLLOFF/ALDER estimates of admixture in HOA populations between 15 ka (Table 1).
Summary and implications
We find that most of the non-African ancestry in the HOA can be assigned to a distinct non-African origin Ethio-Somali ancestry component, which is found at its highest frequencies in Cushitic and Semitic speaking HOA populations (Table 2, Figure 2). In addition to verifying that most HOA populations have substantial non-African ancestry, which is not controversial [1114,16], we argue that the non-African origin Ethio-Somali ancestry in the HOA is most likely pre-agricultural. In combination with the genomic evidence for a pre-agricultural back-to-Africa migration into North Africa [43,61] and inference of pre-agricultural migrations in and out-of-Africa from mitochondrial and Y chromosome data [13,3237,47,99102], these results contribute to a growing body of evidence for migrations of human populations in and out of Africa throughout prehistory [57] and suggests that human hunter-gatherer populations were much more dynamic than commonly assumed.
We close with a provisional linguistic hypothesis. The proto- Afro-Asiatic speakers are thought to have lived either in the area of the Levant or in east/northeast Africa [8,107,108]. Proponents of the Levantine origin of Afro-Asiatic tie the dispersal and differentiation of this language group to the development of agriculture in the Levant beginning around 12 ka [8,109,110]. In the African-origins model, the original diversification of the Afro-Asiatic languages is pre-agricultural, with the source population living in the central Nile valley, the African Red Sea hills, or the HOA [108,111]. In this model, later diversification and expansion within particular Afro-Asiatic language groups may be associated with agricultural expansions and transmissions, but the deep population representation, which allows for greater overlap of mutually typed SNPs across studies. The 90K dataset includes data for 91,101 SNPs from HOA, HapMap3, HGDP, and North Africa populations. The 260K dataset includes data for 259,257 SNPs from the HOA, HapMap3, HGDP, southern Africa, and selected West Asian populations (see Table S1 for populations in
the 90K and 260K datasets).
More --> http://www.plosgenetics.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371%2Fjour nal.pgen.1004393&representation=PDF