Page 1 of 2 12 LastLast
Results 1 to 10 of 13

Thread: The Oracle Question

  1. #1
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    3 Not allowed!

    Exclamation The Oracle Question

    The popularity of Oracles is not something new, you get the results of a test based on K clusters and then with those percentages the Oracle compares the distance of your numbers with the ones from certain ethnic group or with admixtures of ethnic groups, basically giving an accurate result of your ancestry, right? WRONG!

    PART I: THE SNP PROBLEM
    SNPs are variations that occur in a certain position of our DNA and that are based around only one nucleotide. The number of SNPs found by scientists is really big, reaching the millions, but as sequencing your entire genome is something really expensive, ancestry companies usually put a limit of SNPs to test, the average being between 600 to 800 thousand SNPs, and although this means that the test will become more inaccurate, it will also become more affordable to pay.
    Now, the real problem begins when you decide to download your raw data and upload to external calculators that uses a smaller number of SNPs than your raw data, meaning that the number that was tiny compared to the total number of SNPs becomes even smaller. For example, atleast in my experience, calculators from GEDmatch used approximately 50 thousand SNPs out of the 700 thousand that I have in my raw data, that's basically 10x less!

    PART II: THE NUMBER PROBLEM
    Ok, ignoring the SNP problem, there's also another huge problem with Oracles, and that is the number problem. Going back to how the Oracle works, it uses the numbers given by the test to compare with the numbers that ethnic groups usually get from the test, but the problem is that IT'S NOT ALWAYS THE SAME SNPS, JUST THE SAME CLUSTER.

    Imagine we have two SNPs: one is called R and the other is S.
    - R could be either (T)imine or (A)denine, so could S.
    - R with the (T) and S with the (T) are labeled as part of the "Amerindian" cluster.
    - S with the (T) is present in East Asian ethnic groups aswell as Amerindian ethnic groups.

    If we decide to make a test using only 4 SNPs, for the sake of simplicity, and two of them are R and S, then an individual who has R with (T) will score the same 25% "Amerindian" as an individual who has S with (T), excluding TOTALLY the fact that the (T) of the S is also present in East Asians.

    PART III: SOLUTION?
    I'm coding a calculator that will not have those problems, release date is not well defined yet as it is in the initial phase, I will however post a thread when it becomes reality.

    SOURCES:
    https://medlineplus.gov/genetics/und...cresearch/snp/
    https://en.m.wikipedia.org/wiki/K-means_clustering
    https://beholdgenealogy.com/blog/?p=2700

    EDIT: I'm also going to make this thread a Dev Log for the calculator I'm developing.
    Last edited by Mont; 10-16-2021 at 04:49 AM.

  2. #2
    Veteran Member Apricity Funding Member
    "Friend of Apricity"

    Gallop's Avatar
    Join Date
    Mar 2019
    Last Online
    Yesterday @ 07:17 PM
    Location
    Spain
    Meta-Ethnicity
    Epic, Mythical, Mythological and Biblical
    Ethnicity
    Español
    Ancestry
    Andalusia (Spain)
    Country
    Spain
    Y-DNA
    E-BY7449-E-BY7566
    mtDNA
    J1c5c1
    Gender
    Posts
    11,146
    Thumbs Up
    Received: 7,537
    Given: 4,801

    1 Not allowed!

    Default

    I sensed that something was wrong


    Ok, keep us posted.
    https://www.yfull.com/tree/E-BY7449/
    E-V22 - E-BY7449 - E-BY7566 - E-FT155550
    According to oral family tradition E-FT155550 comes from a deserter of Napoleon's troops (1808-1813) who stayed in Spain and changed his surname.

  3. #3
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    1 Not allowed!

    Default

    Quote Originally Posted by Gallop View Post
    I sensed that something was wrong


    Ok, keep us posted.
    I will.

  4. #4
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    0 Not allowed!

    Default

    DEV LOG #1

    - The name of the calculator has already been defined, it will be called "cHenry" (c because of calculator and Henry because I like that name). Although that's the current name, in the future maybe I will change it.

    - According to PCA graphs done with the genome of ethnic groups around the world, the shape shown when a graph is made with the first two eigenvectors is of a triangle and, as something to start, I will be using that as reference and making the calculator calculate your % between Caucasoid, Caucaso-Mongoloid, Mongoloid, Negro-Mongoloid, Negroid and Caucaso-Negroid.

  5. #5
    Veteran Member Apricity Funding Member
    "Friend of Apricity"


    Join Date
    Oct 2016
    Last Online
    @
    Ethnicity
    me
    Country
    European Union
    Y-DNA
    R1a > YP1337 > R-BY160486*
    mtDNA
    H3*
    Gender
    Posts
    6,066
    Thumbs Up
    Received: 7,243
    Given: 2,623

    3 Not allowed!

    Default

    Quote Originally Posted by Mont View Post

    Now, the real problem begins when you decide to download your raw data and upload to external calculators that uses a smaller number of SNPs than your raw data, meaning that the number that was tiny compared to the total number of SNPs becomes even smaller. For example, atleast in my experience, calculators from GEDmatch used approximately 50 thousand SNPs out of the 700 thousand that I have in my raw data, that's basically 10x less!
    Lol. It is problem not in gedmatch but your new raw file which with every new version (in 23me v5 for example) has less and less compatible snps with Gedmatch based calcs. My old FTDNA raw file has between 150-200 000 compatible snps depends on calculator.

  6. #6
    Whip it good oszkar07's Avatar
    Join Date
    Mar 2017
    Last Online
    Yesterday @ 09:47 PM
    Location
    In the Simulation
    Meta-Ethnicity
    Martian From Venus
    Ethnicity
    Hunbritarian
    Ancestry
    TheHuns
    Country
    Austria
    Y-DNA
    I2
    mtDNA
    H1m
    Taxonomy
    Killer
    Politics
    1999
    Hero
    Jesus
    Religion
    Philippians 4.13
    Relationship Status
    Married
    Age
    97
    Gender
    Posts
    5,825
    Thumbs Up
    Received: 8,803
    Given: 13,745

    1 Not allowed!

    Default

    Quote Originally Posted by Mont View Post
    The popularity of Oracles is not something new, you get the results of a test based on K clusters and then with those percentages the Oracle compares the distance of your numbers with the ones from certain ethnic group or with admixtures of ethnic groups, basically giving an accurate result of your ancestry, right? WRONG!

    PART I: THE SNP PROBLEM
    SNPs are variations that occur in a certain position of our DNA and that are based around only one nucleotide. The number of SNPs found by scientists is really big, reaching the millions, but as sequencing your entire genome is something really expensive, ancestry companies usually put a limit of SNPs to test, the average being between 600 to 800 thousand SNPs, and although this means that the test will become more inaccurate, it will also become more affordable to pay.
    Now, the real problem begins when you decide to download your raw data and upload to external calculators that uses a smaller number of SNPs than your raw data, meaning that the number that was tiny compared to the total number of SNPs becomes even smaller. For example, atleast in my experience, calculators from GEDmatch used approximately 50 thousand SNPs out of the 700 thousand that I have in my raw data, that's basically 10x less!

    PART II: THE NUMBER PROBLEM
    Ok, ignoring the SNP problem, there's also another huge problem with Oracles, and that is the number problem. Going back to how the Oracle works, it uses the numbers given by the test to compare with the numbers that ethnic groups usually get from the test, but the problem is that IT'S NOT ALWAYS THE SAME SNPS, JUST THE SAME CLUSTER.

    Imagine we have two SNPs: one is called R and the other is S.
    - R could be either (T)imine or (A)denine, so could S.
    - R with the (T) and S with the (T) are labeled as part of the "Amerindian" cluster.
    - S with the (T) is present in East Asian ethnic groups aswell as Amerindian ethnic groups.

    If we decide to make a test using only 4 SNPs, for the sake of simplicity, and two of them are R and S, then an individual who has R with (T) will score the same 25% "Amerindian" as an individual who has S with (T), excluding TOTALLY the fact that the (T) of the S is also present in East Asians.

    PART III: SOLUTION?
    I'm coding a calculator that will not have those problems, release date is not well defined yet as it is in the initial phase, I will however post a thread when it becomes reality.

    SOURCES:
    https://medlineplus.gov/genetics/und...cresearch/snp/
    https://en.m.wikipedia.org/wiki/K-means_clustering
    https://beholdgenealogy.com/blog/?p=2700

    EDIT: I'm also going to make this thread a Dev Log for the calculator I'm developing.
    With all that you have said it still seems the case for many users here that the many free calculators that use raw data and show Oracles are often reasonably accurate in relation to peoples known ancestry.

    Many users feel they get more information and accuracy from these calculators than what some of the commercial companies give.

    The commercial companies with their updates can be so variable you could look at the results for the same person for each update and sometimes its the case there are significant changes in the ethnicity estimate from update to update. Sometimes the update does not make any sense at all.
    In theory your argument should make sense but when we compare the actual results people get for their known ancestry from commercial companies and online calculators ...sometimes and often the online calcs are better.
    https://vocaroo.com/1f1IYpCqGQPy
    one thing I can tell you is you got to be free

  7. #7
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    0 Not allowed!

    Default

    Quote Originally Posted by Lucas View Post
    Lol. It is problem not in gedmatch but your new raw file which with every new version (in 23me v5 for example) has less and less compatible snps with Gedmatch based calcs. My old FTDNA raw file has between 150-200 000 compatible snps depends on calculator.
    That's why I said it was a problem I have, I didn't assume it was the same for other people.

  8. #8
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    0 Not allowed!

    Default

    Quote Originally Posted by oszkar07 View Post
    With all that you have said it still seems the case for many users here that the many free calculators that use raw data and show Oracles are often reasonably accurate in relation to peoples known ancestry.

    Many users feel they get more information and accuracy from these calculators than what some of the commercial companies give.

    The commercial companies with their updates can be so variable you could look at the results for the same person for each update and sometimes its the case there are significant changes in the ethnicity estimate from update to update. Sometimes the update does not make any sense at all.
    In theory your argument should make sense but when we compare the actual results people get for their known ancestry from commercial companies and online calculators ...sometimes and often the online calcs are better.
    My argument is not for commercial calculators and against GEDmatch calculators, I'm only doing a critique and showing the disadvantages of the GEDmatch calculators that I noticed.

  9. #9
    Senior Member
    Join Date
    Jul 2021
    Last Online
    02-21-2023 @ 07:59 PM
    Meta-Ethnicity
    Germanic
    Ethnicity
    White American
    Ancestry
    English, German, and Irish
    Country
    United States
    Y-DNA
    R-L21
    mtDNA
    H3a
    Taxonomy
    Long face
    Politics
    Anglo-European Nationalism
    Religion
    Christian
    Gender
    Posts
    729
    Thumbs Up
    Received: 507
    Given: 108

    0 Not allowed!

    Default

    This has been a big problem for me because of how different my two commercial tests were, in general my AncestryDNA kit scores significantly differently on GEDmatch generated oracles from my FTDNA kit.

    ANCESTRYk13,46.93,23.17,16.83,5.21,3.93,0.7,0,0.33 ,0.49,0.51,0.48,0,1.42
    SNPs used in this evaluation: 170544.



    FTDNAk13,46.32,23.3,16.17,4.01,5.89,1.19,0.00,0.73 ,0.23,0.86,0.00,0.00,1.3
    SNPs used in this evaluation: 77936.

  10. #10
    ด้้้้้็็็็็้้้้้็็็็ ็้้้้ ้้้้็็็็็้ Mont's Avatar
    Join Date
    Nov 2020
    Last Online
    02-24-2024 @ 10:12 PM
    Location
    Currently at São Paulo, Brazil
    Ethnicity
    Allah's strongest soldier
    Ancestry
    97% Caucasoid, 3% Mongoloid
    Country
    Antarctica
    Y-DNA
    J1-Z2215
    mtDNA
    HV
    Taxonomy
    Mesorrhine, Hyperleptoprosopic, and Beautiful
    Politics
    National-Primitivism with Cat characteristics
    Hero
    Mum
    Religion
    Naturalism / Pantheism / Atheism
    Age
    21
    Gender
    Posts
    373
    Thumbs Up
    Received: 180
    Given: 80

    0 Not allowed!

    Default

    Quote Originally Posted by SouthDutch7991 View Post
    This has been a big problem for me because of how different my two commercial tests were, in general my AncestryDNA kit scores significantly differently on GEDmatch generated oracles from my FTDNA kit.

    ANCESTRYk13,46.93,23.17,16.83,5.21,3.93,0.7,0,0.33 ,0.49,0.51,0.48,0,1.42
    SNPs used in this evaluation: 170544.



    FTDNAk13,46.32,23.3,16.17,4.01,5.89,1.19,0.00,0.73 ,0.23,0.86,0.00,0.00,1.3
    SNPs used in this evaluation: 77936.
    If you want a "more accurate" score, I recommend comparing both raw datas and analysing if there are different SNPs that were not analysed by the Ancestry one and then make a weighted average for a final score.

Page 1 of 2 12 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Post Oracle and Oracle-4 results from MDLP K23b
    By Chris596 in forum Autosomal DNA
    Replies: 34
    Last Post: 08-26-2023, 04:46 PM
  2. Post your GEDmatch Oracle-4 and Oracle results
    By Peterski in forum Autosomal DNA
    Replies: 16
    Last Post: 06-21-2020, 07:13 PM
  3. What does it mean ? (nMonte3 oracle)
    By andre in forum Autosomal DNA
    Replies: 3
    Last Post: 05-18-2019, 12:00 AM
  4. Help with Phenotype Oracle
    By Peterski in forum Anthropology
    Replies: 17
    Last Post: 06-08-2017, 09:12 PM
  5. Oracle -X- Gedmatch
    By Graham in forum Autosomal DNA
    Replies: 55
    Last Post: 01-30-2014, 11:07 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •