Interesting new, comprehensive study released on Iranian genetics. First if it's kind and size

https://journals.plos.org/plosgeneti...l.pgen.1008385



Abstract

Iran, despite its size, geographic location and past cultural influence, has largely been a blind spot for human population genetic studies. With only sparse genetic information on the Iranian population available, we pursued its genome-wide and geographic characterization based on 1021 samples from eleven ethnic groups. We show that Iranians, while close to neighboring populations, present distinct genetic variation consistent with long-standing genetic continuity, harbor high heterogeneity and different levels of consanguinity, fall apart into a cluster of similar groups and several admixed ones and have experienced numerous language adoption events in the past. Our findings render Iran an important source for human genetic variation in Western and Central Asia, will guide adequate study sampling and assist the interpretation of putative disease-implicated genetic variation. Given Iran’s internal genetic heterogeneity, future studies will have to consider ethnic affiliations and possible admixture.

Introduction

The highlands of Iran have been at the crossroads of human migrations [1–6] since the dispersal of modern humans out of Africa due to their geostrategic position. While exercising a strong cultural influence on neighboring regions, Iran has also repeatedly received migratory influx in the past millennia. Among others, this includes the successive southward migration of groups of Indo-European (IE) language speakers (e.g. Scythians, Medes and Persians) [7], the Arab arrival in the 7th century CE and the later influx of Turkic-speaking people from Central Asia. As a result of migrations, internal splits, admixture and other movements, today’s Iranian population comprises numerous ethnic, religious and linguistic groups (S1 Appendix, S1 Fig), prominently including Persians (65% in 2008 [8]), Iranian Azeris (16%), Iranian Kurds (7%), Iranian Lurs (6%), Iranian Arabs (2%), Iranian Baluchis (2%), Iranian Turkmen (1%), Qashqai and other Turkish-language tribal groups (1%) as well as Armenians, Assyrians, Georgians, Jews, Zoroastrians (all <1%) and others, although definitions [9] and reported proportions vary between sources (e.g. [10–12]). Speakers of an Iranian, i.e. Indo-European, language or language dialect (e.g. Persian, Kurdish, Luri, Baluchi) by far outnumber speakers of either a Turkic or Semitic language.

With Iran being located within a belt of countries where consanguineous marriages are widely practiced, Iranian samples have featured prominently in disease-related studies, facilitating the identification of genes involved in rare autosomal recessive diseases by linkage analysis and autozygosity mapping and contributing to a deeper etiological understanding also of complex disorders [13, 14]. Examples demonstrating the value of these populations for human genetic research are ample (e.g. [15–21] for Iran alone), likely moving from the study of few families to population-based studies in the future [22–25]. Still, consanguinity levels are not homogenous across the Iranian population. Early studies, based not on actual genetic data but on familial relation assessment, found these levels to vary between geographic regions and between ethnic groups [26, 27]. A recent study, also based on familial relation assessment, refined these results and reported differences in consanguinity by province, area of residence, birth and marriage cohort as well as with educational level [28]. Patterns of runs of homozygosity (ROHs) or haplotype sharing by descent (HBD) can be indicative of autozygosity, but vary between populations and across genomic locations [29–34], as do the frequency of consanguinity and the moderately correlated degree of genomic inbreeding [35]. Furthermore, autozygosity mapping is predominantly able to detect comparatively recent, local founder mutations [13]. Moreover, carrier frequencies of disease-predisposing variants have been reported to strongly differ between geographic regions in Iran, e.g. for mutations in the GJB2 gene [36] and for β-thalassemia [37], with different ethnic affiliations being the likely cause and possibly helping to determine the pathogenicity of those variants [38]. Finally, studies on copy-number variation (CNV) in the Iranian population (e.g. [39]) were so far disease-specific but not with respect to the general, healthy population.

Perhaps somewhat surprisingly, Central Asia and parts of Western Asia have largely been a blind spot for non-medical genetic studies in the past decades. Until recently, dedicated genetic projects of extant human populations with a global or continental focus (e.g. [33, 40–51] only sporadically included samples, if any, from Iran and did not comprehensively cover the area. Of note, studies that did include Iranian samples frequently treated them as coming from or being representative of a single homogeneous population.

Studies on sporadic ancient DNA (aDNA) samples from the Early Neolithic up to the Chalcolithic in Iran showed the existence of highly genetically differentiated populations that were not ancestral to Europeans but, in the case of specimen from the Zagros Mountains, exhibited some affinity to Zoroastrians [1, 2, 6, 52]. An early study on ABO blood groups found extreme differences between some of 21 considered ethnic groups in Iran [53], whereas another study, published a year later and additionally based on serum proteins and cell enzymes, presented evidence for population substructure between the six included groups (Iranian Turks, Kurds, Lurs, Zabolis, Baluchis and Zoroastrians) with an average FST value of 0.02, based on blood groups, serum proteins and cell enzymes, and some degree of inbreeding [54]. More regionally focused studies on Iran, based on uniparental markers such as Y-chromosomal haplogroups and short tandem repeat (STR) marker haplotypes as well as mitochondrial (mtDNA) haplogroups, confirmed high degrees of genetic diversity in the Iranian population [3–5, 55–61]. These studies reported the respective variation to be predominantly of Western Eurasian origin, with only limited contributions from eastern Eurasia, South Asia and Africa most pronounced in the southern Iranian provinces. These studies also reported ancient and recent gene flow between Iran and the Arabian Peninsula, a surprisingly close relationship between Persians and Iranian Turkic-speaking Qashqai and generally high levels of variation comparable to those in the South Caucasus, Anatolia and Europe. These observations all support the notion of Iran forming a crossroads of human migrations. Notably, a study on Armenians, located to the North of Iran, also suggested multiple admixture events and a general role as bridge between different geographic regions [49].

Using genome- or exome-wide genotype data, a number of studies have analyzed samples of populations that can be considered proxies for ethnic groups in Iran from surrounding countries. In a study of 156 individuals, the population of Qatar was reported to comprise three distinct groups, with one (“Q2”) showing strong affinity to Persians and patterns of admixture [3, 62, 63]. A study on 22 Kuwaitis with Persian ancestry found comparatively high levels of genetic diversity for a non-African population, explicable by past admixture events [34]. A study of 43 individuals belonging to the Parsis, a Zoroastrian religious community in India and Pakistan, demonstrated a closer genetic affinity to today’s Iranian and Caucasus populations than to South Asian populations, but, quite remarkably, an even stronger similarity to Neolithic aDNA samples from Iran compared to modern Iranians, consistent both with the historic record of a southward migration induced by the 7th century’s Arab entry to Iran and more recent admixture events with the modern Iranian population [64]. Findings of increased homogeneity and the dating of past admixture events in further samples of Iranian and Indian Zoroastrians [65] complemented these results. Analysis of 24 individuals from the Indo-European speaking Kalash, a population isolate at the Hindu Kush, Afghanistan, indicated a genetically drifted ancient northern Eurasian population that split during the very early Neolithic and subsequently migrated southwards [66]. Finally, a recent study restricted to exome data merged 87 Iranian with 136 Pakistani samples and demonstrated a somewhat extreme or isolated position when compared to other populations from the Maghreb and from the Arabian Peninsula through Turkey [33]. Still, none of these studies has directly and comparatively studied ethnic groups in Iran.

Correlation between genetic and linguistic proximity of populations has frequently been assumed to be the rule, while language adoption is usually considered as an exception to the rule of co-evolution (e.g. [67–69]), although such claims have repeatedly been disputed (e.g. [70]). Evidence for such correlation is ample in Europe, including autosomal and mitochondrial data [71–77], Y-chromosomal data [77–80] and even, with respect to the spread of Indo-European languages into Europe, ancient DNA data [81]. In-depth studies on other parts of the world found some correlation of language dispersal with Y-chromosomal lineages [82–87], although not in all parts [88]. Furthermore, some instances of male-mediated gene flow over major linguistic barriers have been inferred as well [89, 90]. An early study already observed close genetic relationship between Semitic-speaking and Indo-European-speaking groups in Iran [58]. Studies on neighboring Armenia found evidence for a language replacement [91] event, possibly facilitated by the mixing of multiple source populations during the Bronze Age [49]. However, the relationship between genetic and linguistic proximity has been rarely investigated for Iran and neighboring countries.

While Iran appears to be destined to make further important contributions to human genetic research, an adequate design and interpretation of future medical and population genetic studies is mandatory to arrive at interpretable findings. Here, we comprehensively analyzed the genome-wide diversity of eleven ethnic groups in Iran, their relation to each other as well as with global and local reference populations. Furthermore, we investigated, stratified by ethnicity, levels of consanguinity, the distribution of homozygous and copy-number regions and CNVs as well as the extent of population stratification within Iran and the possible effects in association studies if not accounted for properly and the relationship between spoken language family and genetic proximity.

Results

We compiled a genome-wide data set comprising 1021 unrelated individuals from 11 major Iranian ethnic groups living in Iran (Table 1). For comparison with extant populations, this Iranian data set was merged with either samples from the 1000 Genomes (“1000G”) Project [41–43] (global data set) or with those from three recent studies with a more regionalized focus [2, 6, 44] (local data set), being further grouped by geographic region (S1 Table) or language family (S2 Table). We also compiled 798 human ancient DNA (aDNA) samples from 21 different publications and one pre-print [2, 6, 81, 92–110] (S3 and S4 Tables) for spatial-temporal analysis.


Table 1. Samples included in this study.
doi:10.1371/journal.pgen.1008385.t001


Distinct genetic diversity and substantial heterogeneity

The 11 included Iranian ethnic groups featured distinct and substantial genetic heterogeneity (Fig 1A). Seven groups (Iranian Arabs, Azeris, Gilaks, Kurds, Mazanderanis, Lurs and Persians) strongly overlapped in their overall autosomal diversity in an MDS analysis (Fig 1B), suggesting the existence of a Central Iranian Cluster (CIC), notably also including Iranian Arabs and Azeris. The other four groups (Iranian Baluchis, Persian Gulf (PG) Islanders, Sistanis and Turkmen) presented as strongly admixed populations with contributions by different ancestral populations but always with an orientation towards the CIC, being strikingly different from the CIC and from each other, except for Baluchis and Sistanis who partially overlapped (Fig 1A). On a global scale (Fig 2 including “Old World” populations only; see S2 Fig for all 1000G populations), CIC Iranians closely clustered with Europeans, while Iranian Turkmen showed similar yet distinct degrees of admixture compared to other South Asians. The degree was less pronounced for Baluchis, Sistanis and PG Islanders, with the latter showing a pointed orientation towards Sub-Saharan Africans and a co-localization with numerous Latin American samples. Notably, Iranian Arabs now showed some detachment from the CIC towards Sub-Saharan populations. A local comparison corroborated the distinct genetic diversity of CIC Iranians relative to other geographically close populations [2, 6, 44] (Fig 3 and S3 Fig). Strikingly, the relative genetic location of the Iranian ethnic groups mirrored their geographic location at the nexus between South and Central Asia and West Asia, Northern Africa and the Caucasus. Iranian Baluchis and Sistanis clustered with or nearby Pakistani and other South Asian populations, whereas Iranian Turkmen located next or atop Central Asian populations, respectively. Iranian Arabs appeared distinct from other Arab populations in West Asia and Northern Africa. Furthermore, Zoroastrian samples [6] located as essential CIC members. These results were closely mirrored by the pairwise fixation index (FST) values (Table 2 and S5 Table). CIC groups showed little differentiation (FST~0.0008–0.0033), whereas non-CIC groups consistently yielded much larger values, most extreme for PG Islanders vs Iranian Turkmen (FST = 0.0110). Still, genetic substructure was much smaller among Iranian groups than in relation to any of the 1000G populations, supporting the view that the CIC groups form a distinct genetic entity, despite internal heterogeneity. European (FST~0.0105–0.0294), South Asians (FST~0.0141–0.0338), but also some Latin American populations (Puerto Ricans: FST~0.0153–0.0228; Colombians: FST~0.0170–0.0261) were closest to Iranians, whereas Sub-Saharan Africans and admixed Afro-Americans (FST~0.0764–0.1424) as well as East Asians (FST ~ 0.0645–0.1055) showed large degrees of differentiation with Iranians. If not corrected for, the observed degree of population substructure could severely confound population-based genetic association studies in Iran. In the extreme scenario of cases being sampled exclusively from one ethnic group and controls from another, CIC groups would yield moderate, although still problematic, genomic inflation factor (GIF) values (1.17–1.61), whereas non-CIC groups may yield values up to 3.0 (Table 2).









Ancestry analysis of Iranian ethnic groups

We further explored the genetic composition and origin of the Iranian ethnic groups. ADMIXTURE [111] analyses corroborated the existence of the postulated CIC and pointed to the existence of a distinct Iranian ancestral component. In the analysis of the 11 Iranian groups alone (best-fit model for k = 4), all seven CIC groups featured a single predominant ancestry and slightly varying proportions for the other three ancestral groups, whereas the other four varied in their degree of admixture with different ancestral populations (Fig 4A). Even more strikingly, the global data set analysis (best-fit k = 13) yielded three ancestral populations that substantially and almost exclusively contributed to the 11 Iranian groups but were barely seen in the 1000G populations, with one ancestral population shared across all 11 groups (colored blue in Fig 4B) and another one shared by all groups except for PG Islanders which featured a different dominant ancestral population (colored light-green and light-blue in Fig 4B, respectively). A notable exception was the Tuscans (TSI), sharing a substantial proportion of ancestry with Iranians, in particular those from the CIC. A regional comparison corroborated the unique composition of the Iranian ethnic groups (Fig 4C), with Zoroastrian and other Iranian samples showing a concordant picture. Random down-sampling of our Iranian data set to sizes similar those of the reference groups confirmed that this result was not due to our comparatively large sample sizes (S4 Fig). Explicit modeling of 0–15 migration events using TreeMix [112] evidenced the robustness of the close clustering of all Iranian groups, with Europeans always closest to Iranians (S5–S10 Figs). An influx of ancestors from Asian populations to both Turkmen and Finns was consistently inferred, while Iranian Arabs apparently received some African influx. Modelling Iranians as resulting from admixture between pairs of 1000G populations resulted in positive f3 statistics [113] throughout, thus supporting the primarily autochthonous origin of the CIC groups, except for non-CIC Turkmen that consistently showed negative f3 values (median -0.0083; range -0.0023 –-0.0096) for any pair of an European and an East Asian population (S6 Table), yielding the strongest evidence for Tuscans admixing Han Chinese or Japanese (f3 = -0.0093 –-0.0096; Z = -29,2370 –-30,1030). Modelling non-CIC groups as resulting from admixture between a CIC group and a 1000G population yielded a more nuanced picture (S7 Table). While Sistanis consistently appeared to be admixed between CIC and South Asian groups and, less pronouncedly, with Southern Han Chinese, Turkmen revealed components from CIC, African, European, East Asian and, less pronounced, South Asian groups. PG Islanders and also Baluchis comprised a limited African component but no apparent influx from other groups besides the CIC.



Temporal-spatial relationship of extant Iranians with ancient DNA samples

When relating our extant Iranian samples with published ancient DNA (aDNA) samples of different time strata from Iran and beyond to trace temporal-spatial movements of human populations, we did not find indications for substantial migrations into the CIC groups except for Caucasus populations during Neolithic through Bronze Age times (Figs 5–7), with the latter presenting either as a source or as a refuge, i.e. a migration target. In particular, contributions by Steppe people were apparently very limited and restricted to the Bronze Age or briefly before (Fig 6). Overall, the CIC groups appeared to have experienced a largely autochthonous development over at least the past 5,000 years. Remarkably, Early Neolithic Iranian samples [6, 107] from Western Iran and Tappeh Hesar co-localized with the more remotely located extant PG Islanders (Fig 5), whereas later Bronze Age samples from Tappeh Hesar showed a trend towards the CIC (Fig 6), possibly indicating ongoing admixture between these groups. Of note, Central Asian aDNA samples from the Neolithic and the Bronze Age also co-localized with PG Islanders and showed a similar trend (Figs 5 and 6). Sistani samples most distant from the CIC clustered close to Iron Age Pakistani samples (Fig 7) and may have undergone a similar admixture with CIC groups, however, a lack of samples from the past millennia renders this an open question.







Evidence for several events of language adoption

Languages spoken by the 11 Iranian ethnic groups fell into three different families, namely Afro-Asiatic (Semitic; Arabs), Altaic (Turkic; Turkmen, Azeris) and Indo-European (IE; all others). This linguistic diversity was only partially mirrored by genetic proximity, with Turkic-speaking Iranian Azeris and Semitic-speaking Iranian Arabs closely genetically resembling IE speakers from the CIC, whereas IE-speaking Baluchis, PG Islanders and Sistanis appeared genetically detached from the other IE-speaking groups. After re-classifying our local data set with respect to language family (S2 Table), a general trend of closer genetic proximity, as assessed by a principal-components analysis, for speakers of a language from the same family became obvious (S11A Fig). However, IE speakers fell apart into broadly two distinct groups (corresponding to the European and Indo-Iranian subbranches), while Altaic language speakers comprised widely spread genetic diversity. An approximate autocorrelation analysis based on genetic distance in the first two principal components confirmed a strong localized positive correlation between genetic proximity and spoken language family (S11B Fig).

Different levels of consanguinity in Iranian ethnic groups

Iran’s ethnic groups strongly differed in their levels of consanguinity. Iranian Arabs, Baluchis and Sistanis showed very high inbreeding coefficient values (FI ~ 0.0122–0.0132), exceeding those of the most consanguineous 1000G population (STU). Iranian Gilaks (FI = 0.0001) and Kurds (FI = 0.0010) showed almost no consanguinity, whereas the other groups showed considerably elevated consanguinity (FI ~ 0.0024–0.069) in comparison to the 1000G populations (S12A Fig and Table 3). Of note, consanguinity varied widely within each group, with 50% of individuals showing FI values below 0.0051 (Iranian Arabs), 0.0042 (Iranian Sistanis) and 0.0036 (Iranian Baluchis), respectively, and virtually equal to zero in the remaining groups. Cumulative lengths of IBDseq-inferred autozygous regions and of PLINK-defined runs of homozygosity (ROHs) closely mirrored the distribution of inbreeding values (S12B and S12C Fig). Likelihood-based ROH definition and subsequent length classification by GARLIC (S12D–S12F Fig) revealed substantial amounts of ancestral class-A cumulative ROH length in virtually all Iranian ethnic groups and 1000G populations but also generally much shorter recent class-C cumulative ROH length. Iranian Arabs, Baluchis and Sistanis most prominently deviated from this trend, while most other Iranian groups showed still elevated values, indicating ongoing consanguinity through the past millennia.



Akin to previously studied populations, the genomic distribution of PLINK-defined ROHs followed a highly non-uniform pattern that was highly concordant across all groups (S13A Fig) and similar to that obtained for the non-African 1000G populations (S14 Fig; analysis performed on the markers present in the merged data set), with a number of ROHs reaching substantial frequencies in the Iranian population (S8 Table). CNVs, as defined by the Axiom Analysis Suite v4.0 software, were predominantly detected in Iranian Gilaks, Mazanderanis and Sistanis (S15 Fig) and also comprised a highly non-uniform genomic distribution that showed virtually no systematic overlap with ROHs (S13B and S13C Fig), resulting in a number of high-frequency CNV regions (“CNV islands”; S9 Table) in healthy individuals.

Differences in allele frequencies across Iranian ethnic groups

The observed genetic diversity and partially different ancestry was also evident in the frequency differences for numerous trait-related or predisposing alleles in the Iranian ethnic groups (S10 Table). In general, CIC groups tended to have very similar allele frequencies that were nevertheless often markedly different from those of Europeans, while Iranian Baluchis and Sistanis showed a tendency towards South Asians, although these trends were not present at all markers. A notable exception was lactase persistence-causing marker allele rs4988235-T whose frequency in Iranian Baluchis (22%) was much higher than in any of the other Iranian groups, raising the prospect of convergent evolution [114]. However, we did not find evidence for a selective sweep based on Tajima’s D (S16 Fig) nor when using the integrated haplotype score (iHS) approach [115] (S17 Fig). Although rs4988235 showed a substantial absolute score in Baluchis (|iHS| = 2.42), this value was not significant (two-sided p>0.05) and we also did not observe a clear clustering of SNPs with extreme values as a possible indication for positive selection [116].

Discussion

Our study, based on genome-wide data from a stratified ethnic-group sampling and also including groups previously not well covered, such as Iranian Gilaks, Kurds, Mazanderanis and Sistanis, revealed the distinct and rich genetic diversity of the Iranian population, corroborating previous reports based on uniparental markers. The majority of Iran’s ethnic groups comprise largely overlapping genetic autosomal diversity, implicating a shared and largely autochthonous ancestry, designated as the Central Iranian Cluster (CIC). Notably, the CIC also includes Iranian Arabs and Azeris (Fig 1) as well as the religious group of Zoroastrians (Fig 3), being consistent with the suggestion that Zoroastrians have lived in the area of present-day Iran for millennia and had formed an early group of Indo-European speakers. Still, the CIC comprised substantial internal structure, with pairwise FST values up to an order of magnitude higher than those for more homogeneous populations of similar population size, such as Germany [117], but below the level of substructure reported for Europe, Central Asia, the Near East or Southeast Asia as a whole [45] and much lower than for neighboring Armenia in the Caucasus [118]. Iranian Baluchis, Sistanis, Turkmen and Persian Gulf Islanders showed strong admixture, with the CIC (or its ancestral population) consistently contributing to all of them and contributions from different respective ‘opposite’ ancestral populations, evidencing CIC’s strong impact on human demography in this world region. Since substantial proportions of the Iranian population belong to non-Persian ethnic groups or are admixed, more precise reference to the particular ethnic groups appears mandatory when conducting future genetic studies.

In comparison with global and local reference data, the CIC represents a distinct entity comprising an autochthonous genetic component, clustering closely with geographically adjacent populations and assuming a location in the ‘genetic map’ that corresponds to its geographic location at the nexus between South, Central and West Asia, Northern Africa and the Caucasus. This observation is consistent with limited gene flow reported in previous uniparental marker-based studies and adding a further example on the correspondence between genetic diversity and geographic location, such as Europe [73, 119], explicable by genetic drift as well as admixture. The largely autochthonous development of CIC groups, consistent with an early branching from the Eurasian population before the Neolithic [6], is further corroborated by the distinctiveness of these groups in comparison to different time strata represented by aDNA samples, indicating a genetic continuity for at least several past millennia and eventually mirrored by Zoroastrian genomic diversity. Both, Early Neolithic farmers from West Iran and people from the Steppe appear to have made very limited contributions to CIC groups. In turn, the ‘African’ component shared between PG Islanders and some Sub-Saharan populations likely predates the beginning of the Neolithic and, thus, renders PG Islanders as an early autochthonous group that subsequently became strongly admixed with CIC groups. Notably, Iranian Arabs appear to be slightly genetically detached from other Arab populations in West Asia and Northern Africa. The small ancestry component shared between the CIC and Tuscans may mirror early migrations from the Near East although this requires further investigation.

Correlating genetic affinity with spoken language yielded evidence for a number of language adoption cases in Iran. CIC’s distinct and autochthonous genetic variation indicates that Indo-European (IE) language(s) were likely adopted by some ancient population in Iran several millennia ago, although it remains unclear if this was driven by previously suggested aggressive warrior-bands migration [120] given the lack of Y-chromosomal data in our study. The observed close genetic proximity, based on genome-wide data, of Turkic-speaking Iranian Azeris as well as of Semitic-speaking Iranian Arabs to IE-speaking groups within the CIC, confirms previous reports on Semitic-speaking groups in Iran [58] and Turkic-speaking Azerbaijanis [91, 121–123]. Given their genetic vicinity to other Arab and Caucasian populations, respectively, this is well explained by admixture between some overwhelmingly contributing ancestral IE population(s) and a minor genetic contributor whose language was adopted in the course of past entries. Finally, the spread of IE-speaking Iranian Baluchis, Sistanis and PG Islanders from the other IE-speaking CIC groups is explicable by repeated admixture of some IE-speaking ancestral population(s) with ancient South or West Asian populations, such as Early Neolithic West Iranians, respectively, while retaining their language, causing its adoption by the admixed offspring.

The heterogeneous levels of substantial population substructure as well as of elevated consanguinity in the Iranian population have profound implications for future human genetic studies. They corroborate previous reports on different predisposing variant frequencies across Iran (e.g. [36, 37]) and emphasize the need for an ethnicity-aware approach when performing human genetic studies or genetic counseling in Iran. Population-based association studies should focus on CIC groups to minimize biasing effects due to population stratification, applying to common single-marker analysis but in particular to rare-variant collapsing tests where regional and ethnic group-specificity is to be expected due to the average young age of these variants. Given the genetic diversity even within the CIC, ancestry correction appears mandatory while sample inclusion from the highly admixed groups may increase the risk of biased results. The observed elevated consanguinity in some ethnic groups is in line with previous reports on Iran and other West Asian populations, indicating past and ongoing consanguineous marriage practice and also possibly explaining reported differences between Iranian provinces and residential areas. Family-based linkage or homozygosity-mapping studies should preferentially target groups featuring increased consanguinity levels, namely Iranian Arabs, Baluchis and Sistanis, to increase power especially for studying autosomal-recessive diseases. When studying runs of homozygosity and copy-number variants in diseased individuals, for example in whole-exome and whole-genome sequencing studies, the frequent occurrence of such features in healthy individuals, as shown in this work, requires caution in the interpretation of these features.

In summary, Iranians feature distinct genetic variability, resulting from long-standing genetic continuity, as well as substantial genetic heterogeneity and can, thus, not be treated as a single homogeneous entity. Future human genetic studies have to consider ethnic affiliations for sampling and analyses and should expect widespread admixture in both extant and ancient samples. The observed concordance between genetic diversity and geographic location and examples of lineage break up between language and genetic proximity are consistent with the archeological and historical evidence on Iran as occupying a stretch of land that has seen multiple migration and admixture events in the past millennia. By providing genome-wide population data for Western Asia, thereby filling a lack that has characterized this region for over a decade despite its known diversity and prominent place in human history, we hope to encourage future population genetic, evolutionary and medical studies in Iran and beyond.