2
Population history from the Neolithic to present on the Mediterranean island of Sardinia: An ancient DNA perspective
https://www.biorxiv.org/content/10.1101/583104v1
Some passages:
Recent ancient DNA studies of western Eurasia have revealed a dynamic history of admixture, with evidence for major migrations during the Neolithic and Bronze Age. The population of the Mediterranean island of Sardinia has been notable in these studies – Neolithic individuals from mainland Europe cluster more closely with Sardinian individuals than with all other present-day Europeans. The current model to explain this result is that Sardinia received an initial influx of Neolithic ancestry and then remained relatively isolated from expansions in the later Neolithic and Bronze Age that took place in continental Europe. To test this model, we generated genome-wide capture data (approximately 1.2 million variants) for 43 ancient Sardinian individuals spanning the Neolithic through the Bronze Age, including individuals from Sardinia’s Nuragic culture, which is known for the construction of numerous large stone towers throughout the island. We analyze these new samples in the context of previously generated genome-wide ancient DNA data from 972 ancient individuals across western Eurasia and whole-genome sequence data from approximately 1,500 modern individuals from Sardinia. The ancient Sardinian individuals show a strong affinity to western Mediterranean Neolithic populations and we infer a high degree of genetic continuity on the island from the Neolithic (around fifth millennium BCE) through the Nuragic period (second millennium BCE). In particular, during the Bronze Age in Sardinia, we do not find significant levels of the “Steppe” ancestry that was spreading in many other parts of Europe at that time. We also characterize subsequent genetic influx between the Nuragic period and the present. We detect novel, modest signals of admixture between 1,000 BCE and present-day, from ancestry sources in the eastern and northern Mediterranean. Within Sardinia, we confirm that populations from the more geographically isolated mountainous provinces have experienced elevated levels of genetic drift and that northern and southwestern regions of the island received more gene flow from outside Sardinia. Overall, our genetic analysis sheds new light on the origin of Neolithic settlement on Sardinia, reinforces models of genetic continuity on the island, and provides enhanced power to detect post-Bronze-Age gene flow. Together, these findings offer a refined demographic model for future medical genetic studies in Sardinia.
Continuity from the Sardinian Neolithic through the Nuragic
We found several lines of evidence supporting genetic continuity from the Sardinian Neolithic into the Bronze Age and Nuragic times. Importantly, we observed low genetic differentiation between ancient Sardinian individuals from various time periods. We estimated FST to be 0.0027 ± 0.0014 between Neolithic and late Bronze Age (mostly Nuragic) Sardinian individuals (Fig. 3). Furthermore, we did not observe temporal substructure within the ancient Sardinian individuals in the top two PCs – they form a coherent cluster (Fig. 2). In stark contrast, ancient individuals from many mainland geographic regions, such as central Europe, show larger movements over the first two PCs from the Late Neolithic to the Bronze Age, and also have higher pairwise differentiation (FST=0.0194 ±0.0003).
In the presence of significant influx, differential genetic affinity of a test population x would cause f4 statistics of the form f (Sard Period 1 - Sard Period 2; Pop x - Ancestral Allele) to be non-zero (where “Ancestral Allele” is an inferred ancestral allelic state from a multi-species alignment). However, we observe that no such statistic differs significantly from zero for all test populations x (Supp. Mat. 2D). A qpAdm analysis, which is based on simultaneously testing f-statistics with a number of outgroups and adjusts for correlations, cannot reject a model of Neolithic Sardinian individuals being a direct predecessor of Nuragic Sardinian individuals either (p = 0.54, Supp. Tab. 3). Our qpAdm analysis further shows that the WHG ancestry proportion, in a model of admixture with Neolithic Anatolia, remains stable at ∼17% throughout three ancient time-periods (Tab. 1A). When using a three-way admixture model, we do not detect significant Steppe ancestry in any ancient Sardinian individual, as is inferred, for example, in later Bronze Age Iberians (Tab. 1B, Supp. Fig. 9).
From the Nuragic to present-day Sardinia
Our results demonstrate that ancient Sardinian individuals are genetically closest to contemporary Sardinian individuals among all the ancient individuals analyzed (Fig. 3), and relative to other European populations, there is lower differentiation between present-day and ancient individuals (Supp. Fig. 7). However, we also find multiple lines of evidence for appreciable gene flow into Sardinia after the Nuragic period.
Firstly, present-day Sardinian individuals are shifted from the ancients towards more eastern Mediterranean populations on the western Eurasian PCA (Fig. 2). We observe a corresponding signal in our f4 analysis, in that we see significantly higher affinity of many present-day and some ancient populations to modern Sardinian versus Nuragic Sardinian individuals (f4 of the form f (Mod Sard - Ancient Sard; Pop x - Ancestral Allele), see Fig. 4 and Supp. Mat. 2D). Similarly, f3 statistics that directly test for admixture of present-day Sardinians, with Nuragic Sardinian individuals as one source, yield highly significant negative values, indicating admixture (Fig. 4). Using qpAdm we find that models of continuity from Nuragic Sardinia to present-day Sardinian populations (e.g. Cagliari) without influx are rejected (p < 10−40, Tab. 1C). Moreover, genetic differentiation between the Nuragic and present is higher than across ancient periods (between Nuragic and present-day non-Ogliastra individuals pairwise FST = 0.00695 ± 0.00041; compared to FST = 0.0027 ± 0.0014 between Late Neolithic and Nuragic individuals.)
Figure 4:
Present-day genetic structure in Sardinia reanalyzed with aDNA. A: Scatter plot of the first two principal components trained on 1577 present-day individuals with grand-parental ancestry from Sardinia. Each individual is labeled with a location if at least 3 of the 4 grandparents were born in the same geographical location (“small” three letter abbreviations); otherwise with “x” or if grand-parental ancestry is missing with “?”. We calculated median PC values for each Sardinian province (large abbreviations). We also projected each ancient Sardinian individual on to the top two PCs (gray points). B/C: We plot f-statistics that test for admixture of modern Sardinian individuals (grouped into provinces) when using Nuragic Sardinian individuals as one source population. Uncertainty ranges depict one standard error (calculated from block bootstrap). Karitiana are used in the f-statistic calculation as a proxy for ANE/Steppe ancestry (Pattersonet al., 2012).
Second, we find many populations that can produce significant f4 and f3 statistics consistent with admixture (Supp. Mat. 2C and D). Many of these populations carry high levels of Ancestral North Eurasian (ANE) ancestry, and likely serve as a proxy for ancient Eurasian ancestry that entered Europe after the Neolithic with Steppe expansions, as similarly observed for many present-day mainland Europeans (Pattersonet al., 2012).
ADMIXTURE analysis gives further insight into this signal of gene flow. While contemporary Sardinian individuals show the highest affinity towards EEF-associated populations among all of the modern populations, they also display membership with other clusters (Fig. 5). In contrast to ancient Sardinian individuals, present-day Sardinian individuals carry a modest “Steppe-like” ancestry component (but generally less than continental present-day European populations), and an appreciable broadly “eastern Mediterranean” ancestry component (also inferred at a high fraction in other present-day Mediterranean populations, such as Sicily and Greece).
Figure 5:
Admixture coefficients estimated by ADMIXTURE (K = 4). Each stacked bar represents one individual and color fractions depict the fraction of the given individual’s ancestry coming from a given “cluster”. For K = 4 (depicted here), ancient Sardinian individuals share similar admixture proportions as other western European Neolithic individuals. Present-day Sardinian individuals additionally have elevated Steppe-like ancestry (but less than other European populations), and an additional ancestry component prevalent in Near Eastern / Levant populations. ADMIXTURE results for all K=2, …, 11 are depicted in the supplement (Supp. Fig. 14).
To further characterize signatures of admixture, we used qpAdm to test the fit of a model of present-day Sardinian populations as a simple two-way admixture between Nuragic Sardinian individuals and potential other source populations (Table 1D, Supp. Tab. 4). A model of admixture with modern Sicilians (p = 0.031) had the best support, followed by Maltese (p = 0.0128), Turkish (p = 0.0086) and Greeks (p = 0.00071). For the model of a mixture of Sicilians and ancient Sardinian individuals, we infer an admixture proportion of 43.5 ± 2.1 percent Sicilian admixture (Tab. 1, Sup. Tab. 4, Supp. Fig. 10).
We also considered three-way models of admixture with qpAdm to further refine the geographic origins of this recent admixture signal (Supp. Info 6). Indeed, we find models with admixture between Nuragic Sardinia, one northern Mediterranean source and one eastern Mediterranean source fit well (p > 0.01 for several combinations, Table 1E,F). For a representative sample from Sardinia (Cagliari), across various proxies (excluding Sicily and Malta) the admixture fractions range 10-30% for the “northern Mediterranean” component, 13-33% for the “eastern Mediterranean”, with the remaining 52-57% coming from Nuragic Sardinia. For models with Sicilian or Maltese as proxy sources, the estimates of ancestry for the N. Mediterranean component shrink to small values (8.6% and 6.5% for Sicilian and Maltese, respectively), essentially bringing the fitted parameters towards the two-way mixture models. Maltese and Sicilian individuals appear to reflect a mixture of N. Mediterranean and E. Mediterranean ancestries, and as such they can serve as single-source proxies in two-way admixture models with Nuragic Sardinia (Table 1D).
Caution is warranted when interpreting inferred admixture fractions with each of these simple models; however, the signal across multiple analyses indicates that complex post-Nuragic gene flow, partly from sources originating in the eastern Mediterranean and partly from the northern Mediterranean, has likely played a role in the population genetic history of Sardinia.
Fine-scale structure in contemporary Sardinia
Ancient DNA can shed new light on present-day genetic variation. We, therefore, re-assessed spatial substructure previously observed in a dense geographic modern sampling (1,577 whole genome sequences) from Sardinia (Chianget al., 2018).
In a PCA of the modern Sardinian variation, individuals from Ogliastra fall furthest away from the ancient Sardinian individuals (Chianget al., 2018) (Fig. 4). In stark contrast, in the PCA of modern Western Eurasian variation, the pattern reverses: Ogliastra is placed closest of all provinces to the ancient Sardinian individuals (Fig. 2). Direct tests for admixture using f3 statistics with Nuragic Sardinian individuals as one source yielded highly significant results for all present-day provinces except Ogliastra (Fig. 4). The non-significant value of Ogliastra can have two causes: An actual lack of admixture or high levels of drift that mask admixture f3. However, the f4 statistics and admixture proportions of qpAdm are robust to recent drift of the admixed population, and in both analyses Ogliastra shows an admixture signal that is only slightly weaker than most other provinces (Fig. 4, Supp. Fig. 12). Together, these results suggest high levels of drift specific to Ogliastra (likely also driving the first two PCs of present-day Sardinian variation), but simultaneously also less admixture than other Sardinian provinces.
In the previous section, we reported finding that many non-Sardinian modern populations have a higher affinity to present-day Sardinian individuals than to Nuragic Sardinian individuals (using a f4 statistic of the form f4(Mod. Sard Pop y - Sar-Nur; Pop x - Ancestral Allele) where x are test non-Sardinian modern populations, Fig. 4, Sup. Mat. 2D). Interestingly, the northern province Olbia (north-east) and to some degree also Sassari (north-west) have the highest affinity to most tested populations (Fig. 4). A three-way admixture model fit with qpAdm finds a similar signal. In a model with Tuscan as a proxy for northern Mediterranean immigration and Lebanese as a proxy for a second additional, more eastern Mediterranean source, the inferred admixture fractions vary across Sardinia, with the highest eastern Mediterranean ancestry in the southwest (Carbonia, Campidano) and the highest northern Mediterranean ancestry in the northeast of the island (Olbia, Sassari, Supp. Fig. 12). In addition, we observed a marked shift of individuals from Olbia and Sassari towards continental populations in the PCA (Fig. 4).
We find evidence for at least two phases of post-Nuragic gene flow. First, there is a general shift towards central and eastern Mediterranean sources, demonstrated by the direction of the overall change in the PCA and ADMIXTURE, and the results of modeling population relationships using qpAdm. Second, we detected variation in the signals in the PCA and qpAdm analysis suggesting that the northern provinces of Olbia, and to a lesser degree Sassari, have received more northern Mediterranean immigration after the Bronze Age than the other provinces; mean-while the southwestern provinces of Campidano and Carbonia show more eastern Mediterranean ancestry. Together, these signals suggest temporally and geographically complex post-Nuragic gene flow into Sardinia. Ultimately, aDNA data from these historical periods will be needed to clarify and refine the interpretation.
A preliminary hypothesis would be that an influx from eastern Mediterranean sources is overlayed by more recent influx from the Italian mainland. Historically, both of these seem plausible. Sardinia hosted major Phoenician colonies in the first millennium BCE, principally along the south and west coasts of the island, and previous studies based on uni-parentally inherited markers have found evidence for Phoenician contact and gene flow (Zallouaet al., 2008; Matisoo-Smithet al., 2018). Sardinia was also an important Roman province and then was later under occupation by the Vandals and the Byzantine Empire. There are also more recent sources of immigration in the last few hundred years from Italy, Spain, and Corsica. Shepherds from Corsica immigrated to occupy large pastures left largely empty since the late Middle Ages, bringing an Italian-Corsican dialect (Gallurese) now prevalent in the northeastern part of Sardinia (Lannou, 1941). The differing historical impacts of these external contacts in different regions of Sardinia is supported in the patterns we observe, with more northern Mediterranean ancestry inferred in the north (where Gallurese is prevalent), eastern Mediterranean ancestry inferred in the south and west of Sardinia (where more Punic colonies existed), and more isolation in central regions of Ogliastra and Nuoro.
Bookmarks