PDA

View Full Version : Global25 nMonte and PCA's



Gründig
09-19-2018, 10:35 PM
This is with my converted V3 data, which is apparently comparable to FTDNA raw data (one of the best DNA tests for raw data).

Heres are two pretty solid fits between .5% and .9% using nMonte with R:

Unscaled individuals and i took out all the known southern Lombard samples.

"distance%=0.7547"

AS_G25

Italy_Medieval_Collegno,51.2 (Lombards)
Hungary_Medieval_Szolad,37.6 (Lombards)
Germany_Medieval,8.6 (Germanic Bavarian)
Sweden_Viking_Age,2.6


Heres a pretty simple run with modern unscaled individual populations:

"distance%=0.6157"

AS_G25

German,67.8
Dutch,15
French,13.2
Austrian,4

Gründig
09-20-2018, 02:49 PM
Celtic vs Germanic:

"John_Smith" black dot

http://i.imgur.com/I5Ot31Q.jpg

Gründig
09-21-2018, 02:13 PM
I figured more people would have done the global25.

Teutonski
09-21-2018, 02:15 PM
Übermensch DNA :thumb001:

Gründig
09-21-2018, 02:28 PM
Übermensch DNA :thumb001:

The tests are pretty cool. You can compare yourself against ancient populations, like I did at the top with the lombard Germanic tribes.

The title "Hungary_Medieval_Szolad" and "Italy_Medieval_Collegno" can be misleading. Those were just the locations these samples were found.

Teutonski
09-21-2018, 02:33 PM
The tests are pretty cool. You can compare yourself against ancient populations, like I did at the top with the lombard Germanic tribes.

The title "Hungary_Medieval_Szolad" and "Italy_Medieval_Collegno" can be misleading. Those were just the locations these samples were found.

How you did the test?

Gründig
09-21-2018, 02:36 PM
How you did the test?

You send your raw data to David at eurogenes and he sends you back coordinates.

You can than do it yourself with these coordinates through a program called R or you can do it through this guy poi's online nMonte calculator.

Teutonski
09-21-2018, 02:39 PM
You send your raw data to David at eurogenes and he sends you back coordinates.

You can than do it yourself with these coordinates through a program called R or you can do it through this guy poi's online nMonte calculator.

complicated?

Gründig
09-21-2018, 02:41 PM
complicated?

With using program R, it can be at first.

With poi's calculator, it's much easier. You're just more limited but it's still great.

I've learned to do both, so if you ever decide to get the coordinates ($12), I could help you out.

Gründig
09-24-2018, 03:20 AM
I just found out using scaled coordinates for the models is best. The distance to aim for is between 1% and 2%.

Heres an ancient scaled run with all the southern Lombards removed:

"distance%=1.4404"

AS_G25_scaled

Hungary_Medieval_Szolad,45.8 (Lombards)
Italy_Medieval_Collegno,34.6 (Lombards)
Germany_Medieval,18.2 (Bavarian Germanic)
Sweden_Viking_Age,1.4

Chaos One
09-24-2018, 03:28 AM
I did it with K36 Ancient nMonte, got nice results. The archives that I found had like 10 different populations to mix, so you could get from HG to Chalcolithic or Bronze Age to High Medieval Age, etc.

Gründig
09-24-2018, 01:00 PM
I did it with K36 Ancient nMonte, got nice results. The archives that I found had like 10 different populations to mix, so you could get from HG to Chalcolithic or Bronze Age to High Medieval Age, etc.

What are these archives?

Global25 has a decent amount of samples.

Grace O'Malley
09-24-2018, 01:24 PM
This is with my converted V3 data, which is apparently comparable to FTDNA raw data (one of the best DNA tests for raw data).

Heres are two pretty solid fits between .5% and .9% using nMonte with R:

Unscaled individuals and i took out all the known southern Lombard samples.

"distance%=0.7547"

AS_G25

Italy_Medieval_Collegno,51.2 (Lombards)
Hungary_Medieval_Szolad,37.6 (Lombards)
Germany_Medieval,8.6 (Germanic Bavarian)
Sweden_Viking_Age,2.6


Heres a pretty simple run with modern unscaled individual populations:

"distance%=0.6157"

AS_G25

German,67.8
Dutch,15
French,13.2
Austrian,4

Can you model me using modern populations and then ancients, if possible? If not let me know. I have access to poi's tool but you have to select the populations on that. Thanks in advance.

,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC1 2,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC2 2,PC23,PC24,PC25
Grace_scaled,0.138864,0.136081,0.064488,0.052003,0 .036314,0.017291,0.001645,0.006231,0.004909,-0.00328,-0.009094,0.004646,-0.011298,-0.009771,0.029994,0.005701,-0.007041,-0.004434,0.000377,0.007879,0.007112,0.007296,-0.005793,0.010845,-0.000479

,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC1 2,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC2 2,PC23,PC24,PC25
Grace_unscaled,0.0122,0.0134,0.0171,0.0161,0.0118, 0.0062,0.0007,0.0027,0.0024,-0.0018,-0.0056,0.0031,-0.0076,-0.0071,0.0221,0.0043,-0.0054,-0.0035,0.0003,0.0063,0.0057,0.0059,-0.0047,0.009,-0.0004

Gründig
09-24-2018, 01:29 PM
Can you model me using modern populations and then ancients, if possible? If not let me know. I have access to poi's tool but you have to select the populations on that. Thanks in advance.

,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC1 2,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC2 2,PC23,PC24,PC25
Grace_scaled,0.138864,0.136081,0.064488,0.052003,0 .036314,0.017291,0.001645,0.006231,0.004909,-0.00328,-0.009094,0.004646,-0.011298,-0.009771,0.029994,0.005701,-0.007041,-0.004434,0.000377,0.007879,0.007112,0.007296,-0.005793,0.010845,-0.000479

,PC1,PC2,PC3,PC4,PC5,PC6,PC7,PC8,PC9,PC10,PC11,PC1 2,PC13,PC14,PC15,PC16,PC17,PC18,PC19,PC20,PC21,PC2 2,PC23,PC24,PC25
Grace_unscaled,0.0122,0.0134,0.0171,0.0161,0.0118, 0.0062,0.0007,0.0027,0.0024,-0.0018,-0.0056,0.0031,-0.0076,-0.0071,0.0221,0.0043,-0.0054,-0.0035,0.0003,0.0063,0.0057,0.0059,-0.0047,0.009,-0.0004

I actually learned that scaled is the way to go, so I was going to run some more on myself after work. I can do some for you as well.

I know you're Irish but are you anything else? Just so I have a direction to go in.

Grace O'Malley
09-24-2018, 01:34 PM
I actually learned that scaled is the way to go, so I was going to run some more on myself after work. I can do some for you as well.

I know you're Irish but are you anything else? Just so I have a direction to go in.

Thank you. All my ancestry is Irish. Do you have to actually pick populations to model yourself on or does the program pick what matches you best?

Gründig
09-24-2018, 01:38 PM
Thank you. All my ancestry is Irish. Do you have to actually pick populations to model yourself on or does the program pick what matches you best?

You have to pick all the populations you want to compare yourself to manually.

Grace O'Malley
09-24-2018, 01:44 PM
You have to pick all the populations you want to compare yourself to manually.

So it is similar to using poi's tool. My closest population is actually Icelandic on this oddly enough so I usually get a chunk of Icelandic. I've had people do modelling for me previously. All my ancestry is Irish but I usually get some Scandinavian which could be deep ancestry because I have no known recent ancestors from there.

Gründig
09-24-2018, 01:50 PM
So it is similar to using poi's tool. My closest population is actually Icelandic on this oddly enough so I usually get a chunk of Icelandic. I've had people do modelling for me previously. All my ancestry is Irish but I usually get some Scandinavian which could be deep ancestry because I have no known recent ancestors from there.

It is, you can just run more populations without paying. The distances also seem to vary a little between the two programs but nothing crazy.

Yea I can get some closer distances with some nonsense combinations. However, sometimes certain populations may have samples that fall within another population. For example, it seems I'm closest to a french sample but when I run the test, German always comes out on top. In other words, that French sample clusters with the Germans.

Token
09-24-2018, 02:01 PM
You are overfitting.

Grace O'Malley
09-24-2018, 02:09 PM
This is what I get for Irish, English and Icelandic

Fit 0.8221
English 5.83
Icelandic 31.67
Irish 62.5

But I get a closer distance by substituting Dutch for English

Fit 0.7667
Dutch 17.5
Icelandic 29.17
Irish 53.33

So obviously the closer the fit the better but I doubt I really have Icelandic ancestry or Dutch but I don't match England very well.

Your above population match is a really low distance. Is that your actual known ancestry because it looks a really good model for you?

Grace O'Malley
09-24-2018, 02:11 PM
You are overfitting.

Is that using populations that are too similar?

Token
09-24-2018, 02:27 PM
Is that using populations that are too similar?

Basically.

Gründig
09-24-2018, 02:29 PM
You are overfitting.

For which? And what populations am I supposed to use then? The moderns are my actual ancestry and the ancients align with that pretty well.

Grace O'Malley
09-24-2018, 02:30 PM
Basically.

Could you give an example of a good model to use? People have to go with their known ancestry as well.

Grace O'Malley
09-24-2018, 02:34 PM
If I go with the populations from the Irish DNA Atlas re French and Norwegian making up the majority of the Irish genepool this is what I get.

Fit 1.6527
French_Average 31.67
Norwegian 68.33

Gründig
09-24-2018, 02:38 PM
Could you give an example of a good model to use? People have to go with their known ancestry as well.

Yea I've seen different definitions of over fitting so I'm not really sure what it means.

I learned the key is to use realistic samples of your known ancestry and try staying within the 4-5 population amount. I'm not really sure what I could do differently. I meet those constraints.

I could get a decent amount closer but it would become unrealistic.

Chaos One
09-24-2018, 05:10 PM
What are these archives?

Global25 has a decent amount of samples.

https://anthrogenica.com/showthread.php?11663-K36-based-oracle-with-ancients/page13

Chaos One
09-24-2018, 05:12 PM
For example, my Bronze Age to Early Medieval results:

BA_Hungary_RISE254 22.45
BA_I9041_Mycenaean 16.35
Egyptian_mumy_I_JK2888_ 15.10
Amerindian_Kennewick 12.75
IA_EastKazachstan_Is2 8.95
IA_Wielbark_Kow_26_PL 6.30
MBA_ATP9_Iberia 5.80
LBA_Armenia_RISE396 4.30
BA_Hungary_RISE371 3.20
BA_Hungary_RISE374 3.00
HG_Ethiopia_Mota1 1.80

Imperator Biff
09-24-2018, 06:57 PM
I just found out using scaled coordinates for the models is best. The distance to aim for is between 1% and 2%.

The optimum target distance is actually more so around 4%, anything lower than that means the model is overfitted.

Gründig
09-24-2018, 07:06 PM
The optimum target distance is actually more so around 4%, anything lower than that means the model is overfitted.

This is false. I've spoken to David about it before. For scaled coordinates, a solid fit is between 1% to 2% with 4 to 5 populations being used. He did say that distances up in the 4% area can sometimes be acceptable as well.

If what you were saying was true, I would be over fitting drastically using the Germany populations alone....

Raizen
09-25-2018, 12:36 AM
This is false. I've spoken to David about it before. For scaled coordinates, a solid fit is between 1% to 2% with 4 to 5 populations being used. He did say that distances up in the 4% area can sometimes be acceptable as well.

If what you were saying was true, I would be over fitting drastically using the Germany populations alone....

does a models like that looks solid? i think it fits pretty well my background

"distance%=2.268"

Portuguese_Madeira,78.8
Yoruban,12
Ashkenazi,6
Karitiana,3.2

Token
09-25-2018, 12:08 PM
For which? And what populations am I supposed to use then? The moderns are my actual ancestry and the ancients align with that pretty well.

You are using outgroups with too much overlap, specially on your first model. The Collegno and Szólad outgroups, for example, are basically identical. You shoud aim 1-2%, anything lower than that and you are already overfitting your model.

By the way, Global25 is not the best for modelling recent ancestry, their spreadsheet is severely lacking on this matter.

Grace O'Malley
09-25-2018, 01:22 PM
You are using outgroups with too much overlap, specially on your first model. The Collegno and Szólad outgroups, for example, are basically identical. You shoud aim 1-2%, anything lower than that and you are already overfitting your model.

By the way, Global25 is not the best for modelling recent ancestry, their spreadsheet is severely lacking on this matter.

What's the best for modelling recent ancestry?

Token
09-25-2018, 01:40 PM
What's the best for modelling recent ancestry?

K13 or K36 spreadsheet. You need to do everything manually but it compensates you with much more flexibility. G25 is completely focused on ancient samples.

Gründig
09-25-2018, 02:55 PM
K13 or K36 spreadsheet. You need to do everything manually but it compensates you with much more flexibility. G25 is completely focused on ancient samples.

Didn't David create eurogenes as well? The results are that different? My gedmatch K13 actually aligns pretty well with the nMonte global25 runs I did.

Either way, how do you go about running the k13 or k36 spreadsheets through nMonte?


You are using outgroups with too much overlap, specially on your first model. The Collegno and Szólad outgroups, for example, are basically identical. You shoud aim 1-2%, anything lower than that and you are already overfitting your model.

By the way, Global25 is not the best for modelling recent ancestry, their spreadsheet is severely lacking on this matter.

Those first runs at the top of this post were using unscaled samples (I recently learned that I should use scaled) and using one sample along would already put me at like below a 1.5 distance. So I asked around and was told with unscaled to aim for .5% to .9%. So my runs were within the constraints. However, using the scaled you aim for 1% to 2% as you said.

Grace O'Malley
09-25-2018, 03:13 PM
This is most likely the most accurate as far as what could contribute to my ancestry but not the lowest number. I'm not sure what else I can accurately add.

Fit 0.8652
Irish 70.83
Norwegian 17.5
Scottish 11.67

If I add French to this I get 0.

Fit 0.8647
French 0
Irish 70.83
Norwegian 17.5
Scottish 11.67

Not really sure of what else I can add realistically.

If I remove Irish it mostly goes to Scottish.

Fit 1.2681
French 5
Norwegian 24.17
Scottish 70.83

Gründig
09-25-2018, 03:24 PM
This is most likely the most accurate as far as what could contribute to my ancestry but not the lowest number. I'm not sure what else I can accurately add.

Fit 0.8652
Irish 70.83
Norwegian 17.5
Scottish 11.67

If I add French to this I get 0.

Fit 0.8647
French 0
Irish 70.83
Norwegian 17.5
Scottish 11.67

Not really sure of what else I can add realistically.

If I remove Irish it mostly goes to Scottish.

Fit 1.2681
French 5
Norwegian 24.17
Scottish 70.83

This is all on the online tool? I specifically asked David what to aim for using this tool and he said 4-5 populations and between 1% to 2%.

So that looks good. The fact you're below 1% in this case would be most likely considered "over fitting".

Sorry I haven't run your results yet, things got really busy yesterday.

Gründig
09-25-2018, 03:26 PM
does a models like that looks solid? i think it fits pretty well my background

"distance%=2.268"

Portuguese_Madeira,78.8
Yoruban,12
Ashkenazi,6
Karitiana,3.2

Was this on poi's tool? Or nMonte with R? scaled or unscaled ?

Grace O'Malley
09-25-2018, 03:31 PM
This is what Aren kindly did for me previously. It most likely is a better model than I've done for myself.

"distance%=1.7153"

Grace_scaled

Irish,55.2
Welsh,24.2
Swedish,20.6

"distance%=1.6364"

Grace_scaled

Icelandic,44.4
Irish,29.6
Welsh,26

Token
09-25-2018, 03:37 PM
Was this on poi's tool? Or nMonte with R? scaled or unscaled ?
I ran this model for him with nMonte + K13. The advantage was that i could choose between several Portuguese regions, something that is not possible with G25. In his case, using Central Portuguese or Madeiran samples improved the fit significantly. I couldn't get past 2.25% fst, probably because he is too mixed.

Gründig
09-25-2018, 03:40 PM
I ran this model for him with nMonte + K13. The advantage was that i could choose between several Portuguese regions, something that is not possible with G25. In his case, using Central Portuguese or Madeiran samples improved the fit significantly. I couldn't get past 2.25% fst, probably because he is too mixed.

How are you doing this? I would like to try.

I know with G25 you use the coordinates in a target file and pop data in a different file but I'm not sure how to set it up without using these.

Gründig
09-27-2018, 01:46 AM
Here's one of David's PCA's. I added the colors.

Black star = Me
Yellow = German
Green = Dutch
Dark blue = East French
Unfilled squares = Hungary_Medieval_Szolad (Lombard)
Filled squares = Italy_Medieval_Collegno (Lombard)

My dot is overlapped by the Germans, Dutch and just about East French. I'm also surrounded by Lombard samples.


http://i.imgur.com/kHM89zH.jpg

Gründig
09-27-2018, 11:07 PM
I'm the black + sign:

http://i.imgur.com/Z1s9yxR.jpg

Gründig
09-28-2018, 12:22 PM
Bump

Gründig
09-30-2018, 06:07 PM
I had this K13 nMonte run for me. It seems they used all the populations, hence why there is so many small numbers. I cut off anything under 1%.

I would like to have a run done with only 4-5 populations.


West_German South_Dutch Austrian East_German North_German French
0.3901679 0.5222337 0.6044510 0.8247824 0.9332004 0.9721312
Southeast_English Hungarian Danish North_Dutch
1.1061370 1.1110090 1.1745510 1.1901949

"distance%=0.1088" (1.088)

West_German,39.4
South_Dutch,9.2
Austrian,4.2
North_German,3.8
East_German,3.2
Norwegian,3.2
Southwest_English,2.2
North_Dutch,2
Hungarian,1.8
Irish,1.8
Spanish_Cataluna,1.8
South_Polish,1.6
West_Scottish,1.6
Romanian,1.2
Belorussian,1
French,1

Gründig
10-03-2018, 03:28 PM
Heres another K13 nMonte that the same person ran for me. Looks like all the populations were used again, so I cut out anything lower than 1%.

Using pen=0

"distance%=0.0448" (0.448)

Austrian,23
North_Dutch,16.2
Norwegian,13.8
South_Dutch,12.2
South_Polish,6.2
Serbian,6
North_Italian,5.6
Spanish_Cataluna,4.2
Danish,2.8
Tuscan,2.2
French_Basque,2
Southwest_English,1.8


I'm not positive how pen=0 works. The K13 and K13 pen=0 have their similarities but many differences as well. A couple strange things popped up in the pen=0. Interesting nonetheless.