7
![Not allowed!](images/buttons/up_dis.png)
Thumbs Up |
Received: 25,843 Given: 29,454 |
This paper is from 2022 but I can't see it posted anywhere. I thought it had some interesting information.
This is a new genetic plot with all of Europe including Turkey.Significance
Recent haplotype sharing analyses in specific European populations have revealed fine-scale genetic differentiation that echoes history. An equivalent understanding across the whole European continent would place these insights into a wider context and extend understanding to underdescribed regions. Here, we leverage haplotype data from 5,500 European-ancestry individuals from the UK Biobank (UKBB) in a methodological approach to update and expand the European genetic landscape, extending coverage to regions in the southeast of Europe, identifying communities of high haplotype sharing that may be of interest to genetic mapping, such as Malta. Together, our results highlight European communities with diverse ancestries sampled within the UKBB and demonstrate the potential for insights to be made with other non-European ancestry communities using this dataset.
A sample of European structure in the UKBB. (A) The number of individuals included from each European country analyzed. Countries are grouped by geographic region; these regions are chosen as a means of group representation and do not necessarily imply historical links. Sample sizes from each region are also shown. Abbreviations are as follows: SE Europe (southeastern Europe), S Europe (southern Europe), E Europe (eastern Europe), C Europe (central Europe), N Europe (northern Europe), W Europe (western Europe), Brit. & Ire. (Britain and Ireland). (B) The sample counts for each European region. (C) The first two PCs calculated by PLINK of 5,500 European individuals. Individual genotypes are shown by letters that encode the alpha-2 ISO 3166 international standard codes and are color coded according to geographic region. The median PC for each country/region of birth is shown as a label. Plots were generated using the ggplot2 package (65) in the R statistical computing language (59).
Leiden clustering of 5,500 Europeans from the UKBB. (A) The dendrogram of Leiden clusters, grouping them according to their hierarchical relationships. The three main branches are color coded, with additional subdivisions shown as vertical lines. Each of the 41 cluster labels are shown alongside their associated color and shape coding. (B) The membership of each of the 41 Leiden clusters. Along the x axis shows country/region of birth, and along the y axis cluster membership. The heat map shows the proportion of individuals from each country of birth in each cluster (Freq), and the absolute number. (C) The first two PCs of the pbwt paint chunkcounts coancestry matrix. Each point represents the phased genotype of an individual, color and shape coded according to Leiden cluster membership, using the convention shown in A. Additional labels are shown to show the broad European region that individuals were born from. Plots were generated using the ggplot2 package (65) in the R statistical computing language (59).
https://www.pnas.org/doi/10.1073/pnas.2119281119These groups correspond to individuals predominantly born within the northwest of Europe (NW Europe), center and east of Europe (CE Europe), and the south of Europe (S Europe). We highlight interesting information or findings from these regions below; for a full discussion of the clustering and ancestry of these clusters see SI Appendix, Supplementary Data 5.
Thumbs Up |
Received: 25,843 Given: 29,454 |
NW Europe.
The NW Europe branch contains individuals predominantly from Scandinavia, the Low Countries (Netherlands and Belgium), France, Switzerland, the British Isles, and Ireland. Differentiation is low within this branch, as evidenced from an average within-branch fixation index (FST) value of 0.0006 (Dataset S3) and more limited dispersal in PC space (Fig. 2). The NW Europe branch is divided into two main subbranches, separating British and Irish individuals (i.e., those with a British or Irish birthplace) from continental Europeans. We detect 15 clusters of predominantly British or Irish membership, which we attribute to the larger sampling numbers from Britain resulting from treating each country within the United Kingdom as a separate sampling region. We observed a split in Britain and Ireland between the eastern populations of the British Isles (e.g., England and Wales) and the northwestern (e.g., Ireland and Scotland). We report genetic results from the Channel Islands, an archipelago off the northern French coast. Additionally, one cluster of predominant French membership (“France”) groups with English clusters, possibly reflecting gene flow across the channel or a signature of genetic affinity of northwestern France with neighboring Britain (23, 33). Evidence of such admixture is supported in our “nnls” analysis, where France is a mixture of “England 2” and another French membership cluster, namely, “France & Switz.”
Forming an outgroup to the rest of the branch, French, Swiss, Belgian, Dutch, and Scandinavian individuals are branched together, with subbranches separating Scandinavian countries (including Denmark) from the others. All 17 Icelandic individuals sampled are branched with Norwegians, which we attribute to the small sample of Icelandic individuals. In the nnls analysis we observe that the Netherlands and Denmark clusters show evidence of haplotype sharing consistent with their geographic proximity.
Central-Eastern Europe.
The CE Europe branch contains the following three subbranches: NE Europe (with Baltic, Polish, and Russian membership), CE Europe (with membership from the north of the Balkans and the center and east of Europe), and Finland as an outgroup. Consistent with previous observations (34, 35) Finland shows evidence of isolation from other European regions, projecting away in PC space and showing high differentiation in FST and total variance distance (TVD) values (Datasets S2 and S3). The majority of Estonian individuals are placed in the “Mixed Scand.” cluster that also groups Swedish, Norwegian, and Finnish individuals who project between Finland and the rest of Scandinavia.
Individuals from the Baltic countries Latvia and Lithuania are clustered together. Estonian individuals project intermediately between Finnish and Baltic–Russian individuals in PC space (Fig. 1), suggesting Finish to Baltic gene flow. Grouped with the “Latvia & Lithuania” clusters are two clusters of predominantly Polish and Russian membership, respectively, forming a cline from east to west (Fig. 2).
The CE Europe branch bridges northeastern and western Europe in PC space (Fig. 2), containing two clusters, namely, “CE Europe 1 and 2.” The latter groups individuals from more western countries and the former more eastern, apparently reflecting the east to west cline in central Europe. This gradual cline in central European genetics (Fig. 2) highlights the need for analyzing continuous genetic data concurrently with clustering analyses for full context of such genetic structure.
The two clusters of northern Balkans membership include 246 of the 448 individuals from the Balkans/SE Europe region (which we additionally associated Romania with). Cluster membership is predominantly from countries north of Albania/Greece/North Macedonia—suggesting a north/south divide on the peninsula that is echoed in PCA (Fig. 2). The “NW Balkans” and “NE Balkans” clusters demonstrate a further geographic cline from east to west, with the former grouping Croatian, Bosnian and Herzegovinian, and Serbian individuals and the latter Romanian and Bulgarian. Our nnls analysis models more southern ancestry (proxied by Greek individuals) in the NE Balkans cluster than the NW (SI Appendix, Fig. 4.3).
Southern Europe.
Leiden clustering yields three broad groups of clusters with southern European membership, as follows: one grouping individuals born around the eastern Mediterranean (i.e., Italy, Greece, Turkey, and Cyrpus), the Iberia Peninsula (Spain and Portugal), and finally two outgroup clusters (“Mixed European” and “Malta”).
Most sampled Greek individuals form a single cluster that in the nnls analysis is modeled as a mixture of haplotypes from neighboring southern clusters such as “Italy” and “Turkey,” but as well haplotypes from the north from NE Balkans. We observe a smaller cluster grouped with the “Greece” cluster on the dendrogram (Fig. 2) that contains all sampled Albanian individuals and also projects separately to Greece in PCA; this finding is suggestive of an additional structure that we explore in our IBD-based analyses below. Elsewhere in this branch are the majority of Cypriots, Turkish, and Italian samples grouped into their own three respective clusters. In Italy, our clustering approach does not resolve the north–south clustering previously observed (4). In the Iberian branch, we group individuals from Spain and Portugal into their own respective clusters that is consistent with previous findings (3), although the “Portugal” cluster contains approximately a quarter of sampled Spanish individuals. In addition, a cluster of mixed French and Swiss membership (France & Switz.) is grouped with Spain and Portugal, appearing to group individuals with haplotype sharing between Switzerland, Italy, France, and Spain in the nnls analysis (SI Appendix, Fig. 4.3).
Thumbs Up |
Received: 25,843 Given: 29,454 |
Measures of inbreeding differentiate European genetic histories. (A) The per-Leiden cluster; the mean total length of autosomal ROH was >1.5 Mb. (B) The average total length of ROH was >1.5 Mb versus the average number of ROH for each Leiden cluster, differentiating the burden of long/short ROH in each cluster. Error bars show 95% confidence intervals. (C) The mean FROH and FSNP values for each Leiden clusters with 95% confidence intervals in error bars. Mean FROH is an estimate of the total inbreeding relative to an unknown base generation. Mean FSNP is an estimate of inbreeding in the current generation, with FSNP = 0 indicating random breeding, FSNP <0 indicating inbreeding avoidance, and FSNP >0 indicating inbreeding. Thus, 1) points along the x-axis show excess homozygosity not explained by ROH (caused for example by admixture or excess allele frequency drift compared to coanalyzed samples), 2) points along the y axis indicate that homozygosity is caused by historical small population size rather than consanguinity, and 3) points along the solid diagonal line indicate that all excess population homozygosity can be accounted for by ROH.
We first differentiated demographic histories characterized by long IBD-segments shared within populations (for example, recently isolated or practiced endogamy) from those characterized by large numbers of short IBD-segments (for example, a historical bottleneck). For individuals within the same cluster, we plotted the per-individual mean total length of IBD versus the mean number of IBD segments shared (Fig. 3), showing the overall distribution of such values across all clusters (Fig. 3, Left) and focusing on subbranches of related clusters (Fig. 3B). Generally, we confirm a broad south–north gradient of increasing haplotype sharing in Europe. Individuals of Finnish ancestry present some of the highest levels of within-population IBD sharing in our sample of European haplotypes and is predominated by sharing of short IBD-segments. These Finnish results are consistent with a historical bottleneck and previous genetic observations (38, 39), as well as one of the lowest estimated historical Ne in our analysis (Fig. 4) and average coordinates of the inbreeding coefficient that is the proportion of the genome covered by ROH > 1.5 Mb (FROH), and the inbreeding coefficient that is measured by the observed versus the number of expected homozyotes (FSNP) for the Finnish clusters (Fig. 5).
Although S Europe trends toward a larger historical effective population size (Ne) and low levels of haplotype sharing (e.g., within Italy), there are notable exceptions. Maltese genetic differentiation is expanded upon in IBD-segment analysis (Fig. 3) and agrees with previous IBD estimates from a smaller Maltese sample (40). Malta has a slightly lower average within-cluster total IBD length than Finland, although the average Maltese IBD-segment is longer (Malta IBD segment = 3.17 cM, Finland = 2.07 cM), suggesting a more recent source of this elevated sharing. These results are matched by low historical Ne. FROH and FSNP analysis show that autozygosity is consistent with a historically small Ne rather than consanguinity in Malta.
Within SE Europe, both Turkey and Cyrpus exhibit elevated haplotype sharing (Fig. 3), as well as a lower historical Ne (Fig. 4) and evidence of modest consanguinity (Fig. 5). The IBD sharing profile of the Albania and Greece cluster supports evidence of isolation, with elevated haplotype sharing that is equivalent to that found in northeastern Europe (Fig. 2). Albania and Greece also present a consistently lower Ne than Greece, and FROH/FSNP results are consistent with isolation. Elsewhere in SE Europe, we observe NW Balkans presenting slightly longer within-cluster IBD segments than NE Balkans, which is matched with a consistently lower Ne and elevated ROH—suggestive of a smaller population than the northeast of the Balkans or neighboring central Europe to the north. Interesting, we find that a subset of Spanish individuals who present elevated within-cluster IBD-segment sharing differ with those of most in Spain (Fig. 3). In a focused analysis (SI Appendix, Supplementary Data 9), we conclude that these represent a distinct community or population clustered with other Spanish individuals who nevertheless exhibit elevated haplotype sharing consistent with isolation. These individuals project from other Spanish individuals in PC six (SI Appendix, Supplementary Data 9) and when projected on top of Human Origin references project toward “Spanish North” or “Basque” references (SI Appendix, Fig. 4.2).
Continuing previous observations of elevated haplotype sharing in island populations, we confirm previous signatures of isolation in island communities in northern Britain and expand with more results. We observe increased haplotype sharing of the British archipelago communities of Orkney, the Channel Islands, and the Isle of Man (Fig. 3) that is consistent with previous observations from Orkney (2), showing results from the Channel Islands and expanding upon previous analyses of the Isle of Man (9). These footprints appear to be more pronounced in Orkney, with a smaller Ne 10 generations ago (Fig. 4), as well as slightly longer IBD segments than those shared within the Channel Islands (Orkney IBD segment = 2.18 cM, Channel Island segment = 1.86 cM, Isle of Man segment = 1.90 cM). While Icelandic individuals do not form a private cluster (Fig. 2), there is evidence of elevated IBD sharing consistent with previous observations of homogeneity (41). We observe an increase of the total length of IBD and number of IBD segments between “Norway” Icelanders (i.e., Icelandic individuals placed in the cluster Norway) (45.9 cM and 20 segments, respectively) compared to that observed between Norway Norwegians (22.6 cM and 14 segments). This difference was significant both for total length of IBD (Mann–Whitney U, P = 4.5 × 10−11) or number of segments (Mann–Whitney U, P = 1.1 × 10−10).
Lastly, in an analysis of country-of-birth versus PCA of genetic relationship matrices, and network-based clustering methods, we have identified a community of individuals sampled from the UKBB with evidence of Ashkenazi Jewish ancestry. An analysis of haplotype sharing patterns supports a population of increased haplotype sharing that is intermediate between our Finnish and Maltese profiles (Fig. 3), with a low historical Ne 30 generations ago that has expanded within the past 10 generations (Fig. 4) and an increase in homozygosity (Fig. 5). In a focused analysis, we show that this cluster contains two broad groups of individuals, as follows: one with elevated haplotype sharing, with more and longer IBD and ROH detected, and another with a mixture of ancestries reflective of individuals with a recent admixture outside of the community and a generally higher historical Ne. This cline of elevated haplotype sharing is captured by PC three of the PCA of the pbwt paint coancestry matrix.
Thumbs Up |
Received: 14,166 Given: 6,685 |
It's a good study but it has to be mentioned that all the national samples are just from UK residents born in a particular country, so it won't be 100% ethnically accurate, hence oddities like Hungary being more southern than Croatia, Cyprus being more northern than Italy, and Maltese being near French. The last two are obviously from British or half-British born in those countries.
Germany is also probably shifted NW due to the amount of British of military families born there after the war.
Last edited by Creoda; 03-13-2023 at 06:35 PM.
Spoiler!
Thumbs Up |
Received: 3,420 Given: 4,749 |
Thumbs Up |
Received: 13 Given: 126 |
In case anyone's interested in looking at the FST table from this study, click here: https://docs.google.com/spreadsheets...it?usp=sharing.
The table was supposed to be available as Dataset S03, but the article doesn't provide any links to such dataset. I contacted one of the co-authors, Dr Edmund Gilbert, and he kindly provided it to me.
Thumbs Up |
Received: 14,166 Given: 6,685 |
Thumbs Up |
Received: 13 Given: 126 |
Too big to screenshot. But it's working now, thanks for the heads up.
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks