the raw data is not phased, there is no point in doing it.
Printable View
But don't our parents have different allels as well. They cant have the same TT, GG, CC etc on allele 1 and 2 just like we dont.
one thing that I do not understand is why are you slightly closer to one parent than the other? Shouldn't you be halfway inbetween? Or is it possible to be genetically closer to one parent?
Is DNA a Code?
https://evo2.org/dna-atheists/dna-code/
It’s technically and practically not possible.
You only get 50% of each parent, one letter of their two for every nucleotide in the genome. You can’t restore 100% from 50%.
Lets say for one snp your father is AG and your mother is AA
You got AA too
Doing what mentioned in this thread won’t work because you don’t know if both of your parents are AA, both are AG or one AG and one AA. Because from all 3 options it is possible for you to get AA.
This applies for hundreds of thousands of snps that AncestryDNA tests for. Statistically speaking you would get most snps wrong and
You will end up getting a very inaccurate raw data.
It cannot work and I don’t understand how no one except user “Dick” realized it before.
op is talking about phasing, not recreation of paternal/maternal dna.
Quote:
Phasing is the task or process of assigning alleles (the As, Cs, Ts and Gs) to the paternal and maternal chromosomes. The term is usually applied to types of DNA that recombine, such as autosomal DNA or the X-chromosome. Phasing can help to determine whether matches are on the paternal side or the maternal side, on both sides or on neither side. Phasing can also help with the process of chromosome mapping – assigning segments to specific ancestors. The use of phased data reduces the number of false positive matches, particularly for smaller segments under 15 centiMorgans (cMs).
https://isogg.org/wiki/Phasing
Quote:
Statistical phasing
It is not always possible to obtain trios for phasing and, even if it were, it is not economical or computationally feasible to phase large trio datasets. Sophisticated statistical algorithms have been developed which phase the data based on allele frequencies derived from reference populations. A number of programs are available such as Beagle and FastIBD. Phasing can be done with a high degree of accuracy if large enough reference cohorts are available which are representative of the populations being studied. However, with genotype data the current methodologies are not able to reliably phase small segments under 5 cMs. One study reported a false positive rate of over 67% for 2-4 cM segments when compared with trios.[2]
Statistical or population-based phasing works because our DNA is all very similar and because it's passed on in chunks. Think of it like trying to read a sentence when some of the letters are missing. There are only so many combinations that will fit in the available spaces. If you saw these words:
R-d is my f-v--r-t- c-l--r
You would probably be able to work out that the sentence should read:
Red is my favourite colour
There are regional variations in the "sentences" but even if there were a couple of "deletions" you'd still be able to work it out:
Red is my favorite color
Difficulties arise when you have a short word without the context of a full sentence. R-d on its own could be red, rid, or rod.
Quote:
Genetic genealogy companies
The raw genotype data generated by the Illumina microarray chips used for the autosomal DNA tests from the genetic genealogy companies is unphased and therefore does not distinguish the alleles on the maternal and paternal chromosomes. Customers who download their raw data file will observe that in the genotype column there are two DNA letters for each SNP. These letters are unsorted and could have come from either parent.
AncestryDNA and MyHeritage DNA are currently the only two companies which phase the data before assigning matches. Ancestry has developing its own phasing algorithm known as Underdog. The technical details are provided in the AncestryDNA Matching White Paper. They claim to have an error rate of under 1% and the error rate improves as the size of the training reference dataset increases. As of the beginning of 2016, AncestryDNA uses a reference panel of more than 300,000 genotypes. The details of MyHeritage DNA's phasing is given in the their blog post on major updates and improvements to MyHeritage DNA matching. See also the presentation given by Yaniv Erlich, MyHeritage DNA's Chief Scientific Officer, at Rootstech 2018 MyHeritage DNA 1010: from test to results
Note, however, that if you download the raw data from AncestryDNA or MyHeritage to upload to third-party sites you will receive a file of unphased data.
The 23andMe test and the Family Finder test from Family Tree DNA do not phase the data before assigning matches. However, 23andMe uses statistical phasing for their Ancestry Composition. If one or both parents has been tested at 23andMe Ancestry Composition can determine which ancestral segments have been inherited from each parent. For a detailed explanation see the 23andMe article on The phasing process.
My mother and father have different ancestry. If i do this, i know and understand this thing is possible or impossible.
How do you tell which is you father's and which is your mother's ? My father has never tested at Ancestry. My mother never had her DNA tested before she died.
I gave this method a shot, I am not sure what the hell this is doing, but it is interesting. Unfortunately, I would conclude it does not work very well. One of the kit's I created is closer to my mother, and one is closer to my father - but ultimately they both seem pretty off.
Here is my mom's real kit v her manufactured (this thread's method) kit:
Mom's real kit
# Population Percent
1 North_Atlantic 45.35
2 Baltic 23.78
3 West_Med 13.84
4 East_Med 7.26
5 West_Asian 6.36
6 Red_Sea 1.09
7 South_Asian 1.07
8 Oceanian 0.36
9 Sub-Saharan 0.36
10 Northeast_African 0.29
11 Amerindian 0.13
12 Siberian 0.11
Single Population Sharing:
# Population (source) Distance
1 South_Dutch 2.54
2 West_German 2.79
3 Southeast_English 5.28
4 North_German 5.49
5 Danish 6.61
6 North_Dutch 6.66
7 Orcadian 6.96
8 Southwest_English 7.31
9 Irish 7.96
10 West_Scottish 8.47
Mom's manufactured kit
# Population Percent
1 North_Atlantic 41.54
2 Baltic 26.12
3 West_Med 13.5
4 East_Med 7.19
5 West_Asian 6.4
6 Amerindian 1.71
7 South_Asian 1.69
8 Red_Sea 0.84
9 Siberian 0.75
10 Oceanian 0.26
Single Population Sharing:
# Population (source) Distance
1 West_German 4.26
2 South_Dutch 5.19
3 Austrian 5.68
4 North_German 6.55
5 East_German 6.81
6 Danish 8.67
7 Southeast_English 8.73
8 North_Dutch 8.94
9 Orcadian 9.71
10 Hungarian 9.85
Whoa thanks.
My dad doesn’t want to test cause he believes in those dna conspiracies,
Time to plot him >:)
Someone who actually has his parents tested should try this and tell us how similar results are.
My mom is tested, here is the comparison:
Mom's real kit
# Population Percent
1 North_Atlantic 45.35
2 Baltic 23.78
3 West_Med 13.84
4 East_Med 7.26
5 West_Asian 6.36
6 Red_Sea 1.09
7 South_Asian 1.07
8 Oceanian 0.36
9 Sub-Saharan 0.36
10 Northeast_African 0.29
11 Amerindian 0.13
12 Siberian 0.11
Single Population Sharing:
# Population (source) Distance
1 South_Dutch 2.54
2 West_German 2.79
3 Southeast_English 5.28
4 North_German 5.49
5 Danish 6.61
6 North_Dutch 6.66
7 Orcadian 6.96
8 Southwest_English 7.31
9 Irish 7.96
10 West_Scottish 8.47
Mom's manufactured kit
# Population Percent
1 North_Atlantic 41.54
2 Baltic 26.12
3 West_Med 13.5
4 East_Med 7.19
5 West_Asian 6.4
6 Amerindian 1.71
7 South_Asian 1.69
8 Red_Sea 0.84
9 Siberian 0.75
10 Oceanian 0.26
Single Population Sharing:
# Population (source) Distance
1 West_German 4.26
2 South_Dutch 5.19
3 Austrian 5.68
4 North_German 6.55
5 East_German 6.81
6 Danish 8.67
7 Southeast_English 8.73
8 North_Dutch 8.94
9 Orcadian 9.71
10 Hungarian 9.85
This is the other manufactured kit , if we can call it my dad's haha idk. It is actually quite similar to my result.
# Population Percent
1 North_Atlantic 36.39
2 Baltic 25.45
3 West_Med 14.25
4 East_Med 9.31
5 West_Asian 7.24
6 Red_Sea 2.51
7 Sub-Saharan 2.41
8 Siberian 0.76
9 South_Asian 0.62
10 Oceanian 0.59
11 Northeast_African 0.46
Single Population Sharing:
# Population (source) Distance
1 Austrian 5.11
2 West_German 7.18
3 East_German 7.33
4 Hungarian 7.52
5 South_Dutch 8.54
6 Serbian 10.49
7 French 10.67
8 North_German 11.48
9 Croatian 12.71
10 Moldavian 13.16
11 Southeast_English 13.65
12 Romanian 13.85
13 Danish 13.85
14 North_Dutch 14.02
15 Swedish 14.64
16 Orcadian 14.84
17 Southwest_English 15.41
18 Norwegian 15.6
19 Irish 15.84
20 North_Swedish 16.05
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 61.6% Swedish + 38.4% Greek_Thessaly @ 3.03
2 91.1% Austrian + 8.9% Mozabite_Berber @ 3.08
3 61.8% Serbian + 38.2% West_Scottish @ 3.09
4 66.4% West_German + 33.6% Moldavian @ 3.11
5 59.2% Serbian + 40.8% Orcadian @ 3.11
6 60.8% Serbian + 39.2% Irish @ 3.13
7 91% Austrian + 9% Algerian @ 3.14
8 91.2% Austrian + 8.8% Tunisian @ 3.18
9 60.2% Serbian + 39.8% Southwest_English @ 3.24
10 61.8% South_Dutch + 38.2% Moldavian @ 3.27
11 91.6% Austrian + 8.4% Moroccan @ 3.35
12 57.8% Serbian + 42.2% North_Dutch @ 3.37
13 60.1% Hungarian + 39.9% French @ 3.42
14 67.8% Swedish + 32.2% East_Sicilian @ 3.43
15 57.2% Serbian + 42.8% Southeast_English @ 3.47
16 50.3% Romanian + 49.7% North_Dutch @ 3.48
17 57.6% Serbian + 42.4% Danish @ 3.51
18 96.2% Austrian + 3.8% Bantu_N.E. @ 3.51
19 96.2% Austrian + 3.8% Luhya @ 3.52
20 96.4% Austrian + 3.6% Biaka_Pygmy @ 3.53
Here is my dad's actual phased kit, done with the in-house Gedmatch application. This is supposed to be the 50% DNA I inherited from my dad..
# Population Percent
1 North_Atlantic 29.45
2 Baltic 24.19
3 West_Med 12.97
4 West_Asian 8.03
5 East_Med 7.38
6 South_Asian 4.15
7 Red_Sea 3.76
8 Siberian 3.06
9 Northeast_African 2.16
10 Amerindian 1.4
11 East_Asian 1.21
12 Oceanian 1.17
13 Sub-Saharan 1.05
Single Population Sharing:
# Population (source) Distance
1 Hungarian 9.32
2 Serbian 9.9
3 Austrian 10.37
4 Moldavian 11.01
5 East_German 11.59
6 Croatian 12.16
7 Romanian 12.3
8 West_German 13.33
9 South_Dutch 14.59
10 Bulgarian 15.01
11 French 15.49
12 Ukrainian_Lviv 16.54
13 North_German 16.97
14 South_Polish 16.99
15 Ukrainian 18.21
16 Danish 19.37
17 North_Swedish 19.47
18 Spanish_Galicia 19.52
19 Swedish 19.62
20 Southeast_English 19.64
Mixed Mode Population Sharing:
# Primary Population (source) Secondary Population (source) Distance
1 81.8% Austrian + 18.2% Aghan_Hazara @ 5.48
2 81.4% Austrian + 18.6% Uzbeki @ 5.5
3 83.3% Austrian + 16.7% Hazara @ 5.63
4 80.6% Austrian + 19.4% Afghan_Turkmen @ 5.72
5 81.7% Austrian + 18.3% Afghan_Tadjik @ 5.8
6 79% East_German + 21% Turkmen @ 6
7 80% East_German + 20% Aghan_Hazara @ 6.02
8 57.7% French + 42.3% Tatar @ 6.07
9 79.5% East_German + 20.5% Afghan_Tadjik @ 6.07
10 82.8% Austrian + 17.2% Uygur @ 6.07
11 79.6% East_German + 20.4% Uzbeki @ 6.22
12 78.6% East_German + 21.4% Afghan_Turkmen @ 6.29
13 81.9% Austrian + 18.1% Turkmen @ 6.29
14 81.1% Austrian + 18.9% Tadjik @ 6.36
15 85.3% Hungarian + 14.7% Aghan_Hazara @ 6.37
16 81.8% East_German + 18.2% Hazara @ 6.39
17 85.7% Austrian + 14.3% Burusho @ 6.48
18 80% Austrian + 20% Nogay @ 6.53
19 51.2% Spanish_Galicia + 48.8% Tatar @ 6.58
20 82.5% East_German + 17.5% Bedouin @ 6.6
I'd say the tool could definitely use some improvement, at least if you only have one parent available for phasing. When I use the tool to phase my paternally-inherited DNA, it's a bit of a mess. Still, if looking for a sign of something encouraging, it does predict Hungarian as the closest population, even though at big distance:
Paternal side, phased (GEDmatch tool):
Admix Results (sorted):
# Population Percent
1 North_Atlantic 29.45
2 Baltic 24.19
3 West_Med 12.97
4 West_Asian 8.03
5 East_Med 7.38
6 South_Asian 4.15
7 Red_Sea 3.76
8 Siberian 3.06
9 Northeast_African 2.16
10 Amerindian 1.4
11 East_Asian 1.21
12 Oceanian 1.17
13 Sub-Saharan 1.05
Single Population Sharing:
# Population (source) Distance
1 Hungarian 9.32
2 Serbian 9.9
3 Austrian 10.37
4 Moldavian 11.01
5 East_German 11.59
6 Croatian 12.16
7 Romanian 12.3
8 West_German 13.33
9 South_Dutch 14.59
10 Bulgarian 15.01
11 French 15.49
12 Ukrainian_Lviv 16.54
13 North_German 16.97
14 South_Polish 16.99
15 Ukrainian 18.21
16 Danish 19.37
17 North_Swedish 19.47
18 Spanish_Galicia 19.52
19 Swedish 19.62
20 Southeast_English 19.64
Hopefully your phasing is a bit cleaner and less noisy than what I got -- please let me know when you've done it, I'd like to see the result.
Can someone explain me better how to do this? When I upload my parents' data to gedmatch I get a "you're nkt allowed to use this file", or something like that. Thank you.