Kamal900
06-18-2015, 10:54 PM
East Africa is a strategic region to study human genetic diversity due to the presence of ethnically, linguistically, and geographically diverse populations. Here, we provide new insight into the genetic history of populations living in the Sudanese region of East Africa by analysing nine ethnic groups belonging to three African linguistic families: Niger-Kordofanian, Nilo-Saharan and Afro-Asiatic. A total of 500 individuals were genotyped for 200,000 single-nucleotide polymorphisms. Principal component analysis, clustering analysis using ADMIXTURE, FST statistics, and the three-population test were used to investigate the underlying genetic structure and ancestry of the different ethno-linguistic groups. Our analyses revealed a genetic component for Sudanese Nilo-Saharan speaking groups (Darfurians and part of Nuba populations) related to Nilotes of South Sudan, but not to other Sudanese populations or other sub-Saharan populations. Populations inhabiting the North of the region showed close genetic affinities with North Africa, with a component that could be remnant of North Africans before the migrations of Arabs from Arabia. In addition, we found very low genetic distances between populations in genes important for anti-malarial and anti-bacterial host defence, suggesting similar selective pressures on these genes and stressing the importance of considering functional pathways to understand the evolutionary history of populations.
We applied a principal component analysis (PCA) to investigate the population structure of the new populations genotyped in this study from the Sudanese region (Supplementary Fig. S1a). PC1 (3.56% of the variation) follows a North-South cline and separates populations inhabiting the region between the Nile River and the Red Sea (Nubians and Arabs along the Nile, Beja and Ethiopians along the coast) from Darfurians and Nuba of South-West Sudan, and Nilotes of South Sudan. Copts are a separated group close to the North-East populations, in a more outlier position: they are the extreme of the northern genetic component. PC2 (0.7%) separates the nomadic Fulani from the other populations.
Next, we combined our new populations (140K data set) with previously studied populations of special interest for this analysis: Qatar 12, Egypt
13, and three sub-Saharan populations (Luhya, Yoruba and Maasai) from 1000 Genomes Project 14 to have external references both in the north and south of the Sudanese region. This new data set contains 14,343 SNPs (14K data set). Even if the number of SNPs in this second set is small, it is enough to differentiate components in the African genetic landscape
15. Fig. 2 shows a PCA of this extended data set, where East African populations are distinct from both sub-Saharan and North African populations. PC1 (6.08%) separates between populations from North Africa/Middle East and sub-Saharan Africa (Fig. 2a). Copts are closer to North African and Middle East populations but remain as a separate cluster when PC2 is considered. PC2 (1.46%) along with PC1 separate the two homogeneous clusters of North-East and South-West populations: Nubians, Arabs, Beja and Ethiopians on one hand, and Nuba, Darfurians and Nilotes on the other. PC2 separates all Sudanese and Ethiopian populations from the rest. PC3 (0.56%) differentiates West-African populations (Fulani and Yoruba) from Sub-Saharan East African populations (Maasai) (Fig. 2b). Both PC analysis using data sets with different number of SNPs preserve the topology of the populations. As expected, with a low number of SNPs we observe a higher intra-population variation (Supplementary Fig. S1b)
To infer the ancestral populations of the East African individuals, we run ADMIXTURE from k = 2 to k = 10 in the 14 populations (the analysis for the internal nine populations is presented in Supplementary Fig. S7,S10). We analysed the results from k = 2 to k = 5 as higher numbers of ancestral components do not have a clear origin. A complex pattern of admixture is observed in East African populations (Fig. 3). At k = 2, we already detect different ancestries in the Sudanese populations. Copts show a common ancestry with North African and Middle Eastern populations (dark blue), whereas the South-West cluster (Darfurians, Nuba and Nilotes) share an ancestry component (light blue) with sub–Saharan samples. The North-East cluster (Beja, Ethiopians, Arabs and Nubians) shows both components, although the main component (~70%) is that detected in North Africa and Middle East (Fig. 3). At k = 3 (best statistically supported model, see Supplementary Fig. S8b), a new component (light green) appears, well differentiated from other South Saharan or North Africa and Middle East populations. This component defines South-West Sudanese populations (Nuba and Darfurians) and Nilotes of South Sudan and is different from the main sub-Saharan component as seen in Yoruba and Luhya.
This Nilo-Saharan component, which is also found at lower percentage in the North-East cluster and Maasai, will be outlined in the discussion.
Copts share the same main ancestral component than North African and Middle East populations (dark blue), supporting a common origin with Egypt (or other North African/Middle Eastern populations). They are known to be the most ancient population of Egypt and at k = 4 (Fig.3), they show their own component (dark green) different from the current Egyptian population which is closer to the Arabic population of Qatar.
It is noteworthy the case of the Fulani, which feature more Sudanese ancestry (>45%) than North African (<40%) or sub–Saharan (<15%) and at
k = 5 show their own component (Fig.3). They have a high individual component variance suggesting a recent admixture event in this population.To formally test the results of the admixture analysis, we applied the three-population test (f3 statistics)16. We used all possible pairs of populations as surrogates of the ancestral populations of each ethno-linguistic group. All populations that have a complex pattern of admixture (Fig. 3) showed statistically significant results (Z-score <−4, p-value <3.2×10−5): those of the North-East cluster (Beja, Ethiopians, Arabs and Nubians) and Fulani. Populations from the North-East cluster: Beja, Ethiopians, Arabs and Nubians (Table 2) may be explained as admixture products of an ancestral North African population (similar to Copts) and an ancestral South-West population (Nuba, even if in one case Darfurians
have better fit). These four populations had an intermediate position between Copts and South-West Sudanese populations both in the PC and admixture analyses. Fulani, who are known to have West-African ancestry, have a negative f3 with Copts and Yoruba as source populations (Table 2). As they have a complex history and present high levels of admixture with different populations and high individual variance, this three-population phylogeny seems naïve to explain their complex population history. None of the South-West populations (Darfurians, Nuba and Nilotes) appear as admixed in the three-population test. This result fits the ADMIXTURE analysis (Fig. 3 and Supplementary Fig. S10) and it confirms a specific ancestral component for these populations.
In this study we present an extensive genome-wide data set characterizing East African human genetic diversity in populations from Sudan, South Sudan and Ethiopia. We further analyse the Nilo-Saharan ancestral component within the variation of South-Saharan Africans. This component belongs linguistically to Eastern Sudanic languages and geographically to South and West of Sudan and South Sudan, including highly diverse ethnic groups in a similar genetic background. This component was identified in previous studies using Nilotic populations, but it was not analysed in other Nilo-Saharan populations, such as Darfurians or the Nuba people. In addition, we show convergent evolutionary pressures exerted
on genes involved in anti-malaria and anti-bacterial host defence processes. Africa genetic landscape is shaped by geographic barriers19, but the forces clustering populations vary depending on the scale. On a regional scale, East Africa populations cluster mainly by linguistic affiliation 5. However, it has been previously reported that language plays a lesser role in the genetic clustering of Sudanese populations, as geography is the main factor that groups them 10. This observation is supported by our data, as shown in the PCA (Fig. 2.), where PC1 represents a north-east to south-west axis delimited by the Nile River and its main tributaries: the Blue Nile and the White Nile. Genetic and geographic distances between populations of the Sudanese region are positively correlated (Mantel test; r = 0.5105, p-value < 0.0001), with Sudanese populations clustering in four groups according to their geographic location (Supplementary Fig. S1).Nubians are the only Nilo-Saharan speaking group that does not cluster with groups of the same linguistic affiliation, but with Sudanese Afro-Asiatic speaking groups (Arabs and Beja) and Afro-Asiatic Ethiopians (Supplementary Fig. S1a). Y-chromosome and mitochondrial DNA studies reported Nubians to be more similar to Egyptians than to other Nilo-Saharan populations1,8: Nubians were influenced by Arabs as a direct result of the penetration of large numbers of Arabs into the Nile Valley over long periods of time following the arrival of Islam around 651 A.D 20.
We also found this relationship of Nilo-Saharan Sudanese populations with other Nilo-Saharan populations from Kenya (Maasai), but not as strong, as Maasai show their own genetic component at k = 6, which is different from the Sudanese component (Supplementary Fig. S7) and do not cluster with our Nilo-Saharan speaking populations. In a previous Y-chromosome study 8, most Nilo-Saharan speaking populations, except Nubians, showed little evidence of gene flow with other Sudanese populations.
The presence of the core of Nilo-Saharan languages in the confluence of the two Nile rivers suggests that the Sudanese region is the place of origin of the Nilo-Saharan linguistic family despite their fragmented distribution, as shown by the location of the Nubian language 21,22. It is interesting to note that Nuba populations constitute an homogeneous group, even if some speak Kordofanian (of the Niger-Kordofanian family) and others different languages of two branches of the Nilo-Saharan family. Their genetic composition denotes their Nilo-Saharan origin, with linguistic replacements in some groups. Population displacement, whether it is followed with cultural or genetic exchange with local populations, would explain why not every Nilo-Saharan speaking group has this genetic component (as is the case of Nubians) and not every population that has it is mainly formed by Nilo-Saharan speakers (as is the case of Niger-Kordofanian speaking Nuba). The North African/Middle Eastern genetic component is identified especially in Copts. The Coptic population present in Sudan is an example of a recent migration from Egypt over the past two centuries. They are close to Egyptians in the PCA, but remain a differentiated cluster, showing their own component at k = 4 (Fig. 3). Copts lack the influence found in Egyptians from Qatar, an Arabic population. It may suggest that Copts have a genetic composition that could resemble the ancestral Egyptian population, without the present strong Arab influence.
http://www.nature.com/srep/2015/150528/srep09996/pdf/srep09996.pdf
What do you think guys?
We applied a principal component analysis (PCA) to investigate the population structure of the new populations genotyped in this study from the Sudanese region (Supplementary Fig. S1a). PC1 (3.56% of the variation) follows a North-South cline and separates populations inhabiting the region between the Nile River and the Red Sea (Nubians and Arabs along the Nile, Beja and Ethiopians along the coast) from Darfurians and Nuba of South-West Sudan, and Nilotes of South Sudan. Copts are a separated group close to the North-East populations, in a more outlier position: they are the extreme of the northern genetic component. PC2 (0.7%) separates the nomadic Fulani from the other populations.
Next, we combined our new populations (140K data set) with previously studied populations of special interest for this analysis: Qatar 12, Egypt
13, and three sub-Saharan populations (Luhya, Yoruba and Maasai) from 1000 Genomes Project 14 to have external references both in the north and south of the Sudanese region. This new data set contains 14,343 SNPs (14K data set). Even if the number of SNPs in this second set is small, it is enough to differentiate components in the African genetic landscape
15. Fig. 2 shows a PCA of this extended data set, where East African populations are distinct from both sub-Saharan and North African populations. PC1 (6.08%) separates between populations from North Africa/Middle East and sub-Saharan Africa (Fig. 2a). Copts are closer to North African and Middle East populations but remain as a separate cluster when PC2 is considered. PC2 (1.46%) along with PC1 separate the two homogeneous clusters of North-East and South-West populations: Nubians, Arabs, Beja and Ethiopians on one hand, and Nuba, Darfurians and Nilotes on the other. PC2 separates all Sudanese and Ethiopian populations from the rest. PC3 (0.56%) differentiates West-African populations (Fulani and Yoruba) from Sub-Saharan East African populations (Maasai) (Fig. 2b). Both PC analysis using data sets with different number of SNPs preserve the topology of the populations. As expected, with a low number of SNPs we observe a higher intra-population variation (Supplementary Fig. S1b)
To infer the ancestral populations of the East African individuals, we run ADMIXTURE from k = 2 to k = 10 in the 14 populations (the analysis for the internal nine populations is presented in Supplementary Fig. S7,S10). We analysed the results from k = 2 to k = 5 as higher numbers of ancestral components do not have a clear origin. A complex pattern of admixture is observed in East African populations (Fig. 3). At k = 2, we already detect different ancestries in the Sudanese populations. Copts show a common ancestry with North African and Middle Eastern populations (dark blue), whereas the South-West cluster (Darfurians, Nuba and Nilotes) share an ancestry component (light blue) with sub–Saharan samples. The North-East cluster (Beja, Ethiopians, Arabs and Nubians) shows both components, although the main component (~70%) is that detected in North Africa and Middle East (Fig. 3). At k = 3 (best statistically supported model, see Supplementary Fig. S8b), a new component (light green) appears, well differentiated from other South Saharan or North Africa and Middle East populations. This component defines South-West Sudanese populations (Nuba and Darfurians) and Nilotes of South Sudan and is different from the main sub-Saharan component as seen in Yoruba and Luhya.
This Nilo-Saharan component, which is also found at lower percentage in the North-East cluster and Maasai, will be outlined in the discussion.
Copts share the same main ancestral component than North African and Middle East populations (dark blue), supporting a common origin with Egypt (or other North African/Middle Eastern populations). They are known to be the most ancient population of Egypt and at k = 4 (Fig.3), they show their own component (dark green) different from the current Egyptian population which is closer to the Arabic population of Qatar.
It is noteworthy the case of the Fulani, which feature more Sudanese ancestry (>45%) than North African (<40%) or sub–Saharan (<15%) and at
k = 5 show their own component (Fig.3). They have a high individual component variance suggesting a recent admixture event in this population.To formally test the results of the admixture analysis, we applied the three-population test (f3 statistics)16. We used all possible pairs of populations as surrogates of the ancestral populations of each ethno-linguistic group. All populations that have a complex pattern of admixture (Fig. 3) showed statistically significant results (Z-score <−4, p-value <3.2×10−5): those of the North-East cluster (Beja, Ethiopians, Arabs and Nubians) and Fulani. Populations from the North-East cluster: Beja, Ethiopians, Arabs and Nubians (Table 2) may be explained as admixture products of an ancestral North African population (similar to Copts) and an ancestral South-West population (Nuba, even if in one case Darfurians
have better fit). These four populations had an intermediate position between Copts and South-West Sudanese populations both in the PC and admixture analyses. Fulani, who are known to have West-African ancestry, have a negative f3 with Copts and Yoruba as source populations (Table 2). As they have a complex history and present high levels of admixture with different populations and high individual variance, this three-population phylogeny seems naïve to explain their complex population history. None of the South-West populations (Darfurians, Nuba and Nilotes) appear as admixed in the three-population test. This result fits the ADMIXTURE analysis (Fig. 3 and Supplementary Fig. S10) and it confirms a specific ancestral component for these populations.
In this study we present an extensive genome-wide data set characterizing East African human genetic diversity in populations from Sudan, South Sudan and Ethiopia. We further analyse the Nilo-Saharan ancestral component within the variation of South-Saharan Africans. This component belongs linguistically to Eastern Sudanic languages and geographically to South and West of Sudan and South Sudan, including highly diverse ethnic groups in a similar genetic background. This component was identified in previous studies using Nilotic populations, but it was not analysed in other Nilo-Saharan populations, such as Darfurians or the Nuba people. In addition, we show convergent evolutionary pressures exerted
on genes involved in anti-malaria and anti-bacterial host defence processes. Africa genetic landscape is shaped by geographic barriers19, but the forces clustering populations vary depending on the scale. On a regional scale, East Africa populations cluster mainly by linguistic affiliation 5. However, it has been previously reported that language plays a lesser role in the genetic clustering of Sudanese populations, as geography is the main factor that groups them 10. This observation is supported by our data, as shown in the PCA (Fig. 2.), where PC1 represents a north-east to south-west axis delimited by the Nile River and its main tributaries: the Blue Nile and the White Nile. Genetic and geographic distances between populations of the Sudanese region are positively correlated (Mantel test; r = 0.5105, p-value < 0.0001), with Sudanese populations clustering in four groups according to their geographic location (Supplementary Fig. S1).Nubians are the only Nilo-Saharan speaking group that does not cluster with groups of the same linguistic affiliation, but with Sudanese Afro-Asiatic speaking groups (Arabs and Beja) and Afro-Asiatic Ethiopians (Supplementary Fig. S1a). Y-chromosome and mitochondrial DNA studies reported Nubians to be more similar to Egyptians than to other Nilo-Saharan populations1,8: Nubians were influenced by Arabs as a direct result of the penetration of large numbers of Arabs into the Nile Valley over long periods of time following the arrival of Islam around 651 A.D 20.
We also found this relationship of Nilo-Saharan Sudanese populations with other Nilo-Saharan populations from Kenya (Maasai), but not as strong, as Maasai show their own genetic component at k = 6, which is different from the Sudanese component (Supplementary Fig. S7) and do not cluster with our Nilo-Saharan speaking populations. In a previous Y-chromosome study 8, most Nilo-Saharan speaking populations, except Nubians, showed little evidence of gene flow with other Sudanese populations.
The presence of the core of Nilo-Saharan languages in the confluence of the two Nile rivers suggests that the Sudanese region is the place of origin of the Nilo-Saharan linguistic family despite their fragmented distribution, as shown by the location of the Nubian language 21,22. It is interesting to note that Nuba populations constitute an homogeneous group, even if some speak Kordofanian (of the Niger-Kordofanian family) and others different languages of two branches of the Nilo-Saharan family. Their genetic composition denotes their Nilo-Saharan origin, with linguistic replacements in some groups. Population displacement, whether it is followed with cultural or genetic exchange with local populations, would explain why not every Nilo-Saharan speaking group has this genetic component (as is the case of Nubians) and not every population that has it is mainly formed by Nilo-Saharan speakers (as is the case of Niger-Kordofanian speaking Nuba). The North African/Middle Eastern genetic component is identified especially in Copts. The Coptic population present in Sudan is an example of a recent migration from Egypt over the past two centuries. They are close to Egyptians in the PCA, but remain a differentiated cluster, showing their own component at k = 4 (Fig. 3). Copts lack the influence found in Egyptians from Qatar, an Arabic population. It may suggest that Copts have a genetic composition that could resemble the ancestral Egyptian population, without the present strong Arab influence.
http://www.nature.com/srep/2015/150528/srep09996/pdf/srep09996.pdf
What do you think guys?