A genome-wide genetic signature of Jewish ancestry perfectly separates individuals
with and without full Jewish ancestry in a large random sample of European
Americans


Background:
It was recently shown that the genetic distinction between self-identified Ashkenazi Jewish and non-Jewish individuals is a prominent component of genome-wide patterns of genetic variation in European Americans. No study however has yet assessed how accurately self-identified (Ashkenazi) Jewish ancestry can be inferred from genomic information, nor whether the degree of Jewish ancestry can be inferred among individuals with fewer than four Jewish grandparents.

Results: Using a principal components analysis, we found that the individuals with full Jewish ancestry formed a clearly distinct cluster from those individuals with no Jewish ancestry. Using the position on the first principal component axis, every single individual with self-reported full Jewish ancestry had a higher score than any individual with no Jewish ancestry.

Conclusions: Here we show that within Americans of European ancestry there is a perfect genetic corollary of Jewish ancestry which, in principle, would permit near perfect genetic inference of Ashkenazi Jewish ancestry. In fact, even subjects with a single Jewish grandparent can be statistically distinguished from those without Jewish ancestry. We also found that subjects with Jewish ancestry were slightly more heterozygous than the subjects with no Jewish ancestry, suggesting that the genetic distinction between Jews and non-Jews may be more attributable to a Near-Eastern origin for Jewish populations than to population bottlenecks.

Background

Many genetic and non genetic lines of evidence make clear that there are differences amongst the Jewish and non-Jewish peoples of Europe. There are both specific genetic diseases (e.g. Tay-Sachs) and particular mutations (e.g. the breast cancer BRCA1 185delAG mutation) that have considerably higher incidences in Jewish populations, and both Y chromosome and mtDNA lineages show associations with Jewish heritage [1-5].
No study however has directly addressed the question of whether Jewish individuals form a consistently identifiable group on the basis of genetic data alone, as has been documented for other racial/ethnic groups [6]. Recently, Price et al. [7] showed that self described Jewish ancestry was a major determinant of population genetic structure in European populations, but they did not address the question of whether genetic data might be able to accurately identify which individuals do and do not have Jewish ancestry. Here we investigate whether it is possible to accurately infer the degree of Jewish ancestry using only an individual’s genomic information.

To address this, we considered a random sample of 611 unrelated self-described Caucasian subjects mostly residing in America who specifically reported whether they had Jewish ancestry, and if so, how many grandparents were “Jewish”. All individuals were genotyped for approximately 550,000 polymorphic markers and we applied a principal-component based method to describe the population-genetic structure [8] of the sample. Out of the 611 subjects, 507 reported no Jewish ancestry, 55 reported 4 Jewish grandparents, 4 reported 3 Jewish grandparents, 37 reported 2 Jewish grandparents and 8 reported 1 Jewish grandparent. Of these, 23 reported that they were Ashkenazim, one reported four Sephardic grandparents, two reported three Ashkenazi and one Sephardic grandparent, and two reported two Sephardic grandparents. A further 62 provided European or Russian country-of-origin information for at least one grandparent and 14 were able to give no more information than “European-American”.

Results

Our first test was to assess how accurately individuals with full Jewish ancestry (all four grandparents) could be distinguished from those with no Jewish ancestry using the score on the first principal component axis (PC1). We found that the individuals with full Jewish ancestry formed a clearly distinct cluster from those individuals with no Jewish ancestry (Figure 1). Strikingly, if we look only at the position on the first principal component, in this dataset, every single individual with self-reported full Jewish ancestry has a higher score than any individual with no Jewish ancestry. Interestingly, for the two subjects that appear intermediate between the clear “Jewish” and “Non-Jewish” clusters, one of them reports two Jewish grandparents of Sephardic origin, and one declares full Jewish ancestry, but without country of origin for their grandparents. These analyses imply the possibility of perfect or near perfect resolution of full Jewish ancestry using only genetic data. We should note, however, that if one were to attempt inference about Jewish ancestry it would be necessary to have a ‘training set’ such as that described here to determine the appropriate divisions between individuals with and without Jewish ancestry since the ‘clusters’ fall next to each other. This implies that in practice resolution of full Jewish ancestry would likely be less than perfect, but that the fact that we observed non-overlapping distributions on the first principle component implies that both specificity and sensitivity would be high.

We went on to assess whether participants with 1, 2 or 3 Jewish grandparents could be statistically distinguished from one another and from individuals with either full or no Jewish ancestry. As expected, the majority of these subjects were positioned in between the non-Jewish and the full-Jewish subjects on PC1 (Figure 2).

All but two (36/37) of the subjects with two Jewish parents scored between 0.03 and 0.08 on PC1, all four subjects with three Jewish grandparents scored between 0.08 and 0.1 on PC1, and 496/507 subjects declaring no Jewish ancestry scored below 0.3.
The subjects with only one Jewish grandparent were not distinguishable based on PC1 position. The subjects that did not score within the expected range for their self-declared ancestry are shown in table 1, along with their ancestral information where known. The majority of informative subjects with no Jewish ancestry that scored most highly on PC1 were either of Italian or Eastern Mediterranean descent. This indicates that in a mixed American context these populations may not be easily distinguishable from subjects with a single Jewish parent.
Source