qpAdm thread

**Kaspias** · 03-02-2021, 09:34 PM

Originally Posted by Zoro

Looking much better. Standard errors look good at 3%. Can you post the p-values so we can see if models are pass or fail. The 3rd row contains the p-value. Also can you post the p-right pops used

Bulgarian + Uzbek,

Code:

 f4rank   dof  chisq         p dofdiff chisqdiff   p_nested
                         
1      1    14   11.1 6.75e-  1      16     1483.  2.95e-306
2      0    30 1494.  8.98e-296      NA       NA  NA

Bulgarian + Turkmen,

Code:

f4rank   dof  chisq         p dofdiff chisqdiff   p_nested
                         
1      1    14   12.7 5.50e-  1      16     1029.  5.62e-209
2      0    30 1042.  6.54e-200      NA       NA  NA

Right pops:

Code:

                                                                                               "Papuan.DG",
                                                                                               "Eskimo_Sireniki.DG",
                                                                                               "Jordanian.DG",
                                                                                               "Punjabi.DG",
                                                                                               "Yakut.DG",
                                                                                               "Polish.DG",
                                                                                               "Yoruba.DG",
                                                                                               "Sardinian.DG",
                                                                                               "Finnish.DG",
                                                                                               "Armenian.DG",
                                                                                               "Greek_1.DG",
                                                                                               "Tatar_Volga.SG",
                                                                                               "Iranian.DG",
                                                                                               "Estonian.DG",
                                                                                               "Altaian.DG",
                                                                                               "Uzbek.SG"("Turkmen.SG")

**Zoro** · 03-02-2021, 09:51 PM

Originally Posted by Kaspias

Bulgarian + Uzbek,

Code:

 f4rank   dof  chisq         p dofdiff chisqdiff   p_nested
                         
1      1    14   11.1 6.75e-  1      16     1483.  2.95e-306
2      0    30 1494.  8.98e-296      NA       NA  NA

Bulgarian + Turkmen,

Code:

f4rank   dof  chisq         p dofdiff chisqdiff   p_nested
                         
1      1    14   12.7 5.50e-  1      16     1029.  5.62e-209
2      0    30 1042.  6.54e-200      NA       NA  NA

Right pops:

Code:

                                                                                               "Papuan.DG",
                                                                                               "Eskimo_Sireniki.DG",
                                                                                               "Jordanian.DG",
                                                                                               "Punjabi.DG",
                                                                                               "Yakut.DG",
                                                                                               "Polish.DG",
                                                                                               "Yoruba.DG",
                                                                                               "Sardinian.DG",
                                                                                               "Finnish.DG",
                                                                                               "Armenian.DG",
                                                                                               "Greek_1.DG",
                                                                                               "Tatar_Volga.SG",
                                                                                               "Iranian.DG",
                                                                                               "Estonian.DG",
                                                                                               "Altaian.DG",
                                                                                               "Uzbek.SG"("Turkmen.SG")

Unfortunately both models are fails since p-value of 1st is 2.95e-306 and 2nd about same. Your p-values for pass should be >.05. Remove a few of the p-rights to improve p-value.
you may need another source think about a third source that will help

**Zoro** · 03-02-2021, 10:40 PM

Researchers use ancients for right pops. I have had good luck with this set and they don't have too many missing genotypes. if you are missing some of these samples you can substitute something similar

right= c('Khomani_San','Devils-Gate-N','Bichon','Morocco_Iberomaurusian',
'Anatolia_N','Kotias','Karelia', 'Yana-UP', "Iran_N', 'Kolyma-Mesol')

My Devils-Gate, Yana and Kolyma are WGS but you can use diploids if you have them.

Your p-values should improve alot.

Also not everyone can model successfully with just 2 sources. For example many Kurds can model with just 2 sources but Armenians or Iranians appear to have more complex histories and I usually need at least 3 sources for them. Not sure about your situation.

**Kaspias** · 03-03-2021, 12:18 PM

Originally Posted by Zoro

Unfortunately both models are fails since p-value of 1st is 2.95e-306 and 2nd about same. Your p-values for pass should be >.05. Remove a few of the p-rights to improve p-value.
you may need another source think about a third source that will help

I used the exact same populations you recommend except for Turkmen which I replaced with MA2196, and that's what I get:

Code:

 target      left                weight     se     z
                            
1 Kaspias Bulgarian.DG         0.712 0.0923  7.71
2 Kaspias Turkey_Ottoman_2.SG  0.288 0.0923  3.12



  f4rank   dof chisq        p dofdiff chisqdiff  p_nested
                      
1      1     8  6.59 5.82e- 1      10      87.0  2.10e-14
2      0    18 93.6  3.26e-12      NA      NA   NA       



  pat      wt   dof chisq           p f4rank Bulgarian.DG Turkey_Ottoman_2.SG feasible best  dofdiff chisqdiff p_nested
                                                      
1 00        0     8  6.59 0.582            1        0.712               0.288 TRUE     NA         NA      NA         NA
2 01        1     9 24.4  0.00366          0        1                  NA     TRUE     TRUE        0     -23.9        1
3 10        1     9 48.4  0.000000218      0       NA                   1     TRUE     TRUE       NA      NA         NA
>

If I get it correctly the model still does not pass. What's the reason? I mean, I could add here a 3rd population - Greek or Crimean Tatar - that are potentials for me, but Greek will cause overfitting with Bulgarian whereas there is no Crimean Tatar in the spreadsheet.

In addition, the SNP coverage reduced crucially when leaving Simeon's dataset:

! 29131 SNPs remain after filtering. 27980 are polymorphic.

**andre** · 03-03-2021, 01:21 PM

Originally Posted by Kaspias

I used the exact same populations you recommend except for Turkmen which I replaced with MA2196, and that's what I get:

Code:

 target      left                weight     se     z
                            
1 Kaspias Bulgarian.DG         0.712 0.0923  7.71
2 Kaspias Turkey_Ottoman_2.SG  0.288 0.0923  3.12



  f4rank   dof chisq        p dofdiff chisqdiff  p_nested
                      
1      1     8  6.59 5.82e- 1      10      87.0  2.10e-14
2      0    18 93.6  3.26e-12      NA      NA   NA       



  pat      wt   dof chisq           p f4rank Bulgarian.DG Turkey_Ottoman_2.SG feasible best  dofdiff chisqdiff p_nested
                                                      
1 00        0     8  6.59 0.582            1        0.712               0.288 TRUE     NA         NA      NA         NA
2 01        1     9 24.4  0.00366          0        1                  NA     TRUE     TRUE        0     -23.9        1
3 10        1     9 48.4  0.000000218      0       NA                   1     TRUE     TRUE       NA      NA         NA
>

If I get it correctly the model still does not pass. What's the reason? I mean, I could add here a 3rd population - Greek or Crimean Tatar - that are potentials for me, but Greek will cause overfitting with Bulgarian whereas there is no Crimean Tatar in the spreadsheet.

In addition, the SNP coverage reduced crucially when leaving Simeon's dataset:

! 29131 SNPs remain after filtering. 27980 are polymorphic.

Try to do it with Tuscan, Ukrainian and Turkmen.

**Kaspias** · 03-03-2021, 02:27 PM

Originally Posted by andre

Try to do it with Tuscan, Ukrainian and Turkmen.

Tuscan is too Northern for the base Balkan admixture of Thrace, need something in between Apulia and Islander Greek instead.

Almost got no additional Slav:

Code:

target      left        weight     se     z
                    
1 Kaspias Tuscan_1.DG 0.721  0.180  4.01 
2 Kaspias Polish.DG   0.0334 0.172  0.194
3 Kaspias Turkmen.SG  0.246  0.0398 6.18

Besides, I run these:

Code:

 target       left                     weight     se     z
                                  
1 Bulgarian.DG Hungary_Avar_5            0.391 0.349   1.12
2 Bulgarian.DG Bulgaria_IA               0.457 0.281   1.63
3 Bulgarian.DG Russia_Medieval_Nomad.SG  0.152 0.0781  1.95


 f4rank   dof  chisq        p dofdiff chisqdiff  p_nested
                       
1      2     7   9.69 2.07e- 1       9      24.5  3.58e- 3
2      1    16  34.2  5.13e- 3      11     319.   8.11e-62
3      0    27 353.   1.47e-58      NA      NA   NA

Code:

 target left                     weight     se     z
                            
1 Gagauz Hungary_Avar_5            0.421 0.142   2.96
2 Gagauz Bulgaria_IA               0.429 0.118   3.64
3 Gagauz Russia_Medieval_Nomad.SG  0.151 0.0394  3.83

f4rank   dof  chisq        p dofdiff chisqdiff  p_nested
                       
1      2     7   3.61 8.23e- 1       9      25.2  2.74e- 3
2      1    16  28.8  2.51e- 2      11     316.   4.07e-61
3      0    27 345.   8.25e-57      NA      NA   NA

Code:

target   left                     weight     se     z
                              
1 Romanian Hungary_Avar_5            0.506 0.183   2.77
2 Romanian Bulgaria_IA               0.368 0.148   2.48
3 Romanian Russia_Medieval_Nomad.SG  0.126 0.0471  2.68

  f4rank   dof  chisq        p dofdiff chisqdiff  p_nested
                       
1      2     7   6.68 4.63e- 1       9      24.8  3.17e- 3
2      1    16  31.5  1.16e- 2      11     316.   3.78e-61
3      0    27 347.   2.22e-57      NA      NA   NA

The same model on me:

Code:

 target      left                     weight     se     z
                                 
1 Kaspias Hungary_Avar_5           0.0528 0.234  0.225
2 Kaspias Bulgaria_IA              0.607  0.191  3.18 
3 Kaspias Russia_Medieval_Nomad.SG 0.341  0.0677 5.03

 f4rank   dof  chisq        p dofdiff chisqdiff  p_nested
                       
1      2     7   4.68 6.98e- 1       9      28.3  8.44e- 4
2      1    16  33.0  7.39e- 3      11     298.   1.94e-57
3      0    27 331.   3.88e-54      NA      NA   NA

**Zoro** · 03-03-2021, 02:31 PM

Originally Posted by Kaspias

I used the exact same populations you recommend except for Turkmen which I replaced with MA2196, and that's what I get:

Code:

 target      left                weight     se     z
                            
1 Kaspias Bulgarian.DG         0.712 0.0923  7.71
2 Kaspias Turkey_Ottoman_2.SG  0.288 0.0923  3.12



  f4rank   dof chisq        p dofdiff chisqdiff  p_nested
                      
1      1     8  6.59 5.82e- 1      10      87.0  2.10e-14
2      0    18 93.6  3.26e-12      NA      NA   NA       



  pat      wt   dof chisq           p f4rank Bulgarian.DG Turkey_Ottoman_2.SG feasible best  dofdiff chisqdiff p_nested
                                                      
1 00        0     8  6.59 0.582            1        0.712               0.288 TRUE     NA         NA      NA         NA
2 01        1     9 24.4  0.00366          0        1                  NA     TRUE     TRUE        0     -23.9        1
3 10        1     9 48.4  0.000000218      0       NA                   1     TRUE     TRUE       NA      NA         NA
>

If I get it correctly the model still does not pass. What's the reason? I mean, I could add here a 3rd population - Greek or Crimean Tatar - that are potentials for me, but Greek will cause overfitting with Bulgarian whereas there is no Crimean Tatar in the spreadsheet.

In addition, the SNP coverage reduced crucially when leaving Simeon's dataset:

! 29131 SNPs remain after filtering. 27980 are polymorphic.

You can increase the 29K SNPs alot by using the 1240K SNP Reich set.

Let's first figure out which populations you are genetically closest to by running F2s. This will also tell us if somehow your personal data got corrupted or not. Don't use ancients like I did to keep your SNPs up.

When I run F2s for Bulgarians using 200K SNPs I get the following but I'm not using alot of pops more relevant to Bulgarians such as Hungarians, Greeks etc which you should use. In fact you can use all the Simons 30 or so pops in your dataset

POP1	POP2	F2	SE	Z
Bulgarian	Sardinian	0.246	0.0010	258
Bulgarian	Estonian	0.247	0.0008	313
Bulgarian	Armenian	0.249	0.0008	296
Bulgarian	Georgian	0.249	0.0007	358
Bulgarian	Turkish-Kayseri	0.249	0.0007	371
Bulgarian	Tatar-Volga	0.25	0.0008	328
Bulgarian	Saami	0.25	0.0007	343
Bulgarian	Iran-Hasanlu-IA	0.251	0.0011	239
Bulgarian	Iranians-Fars	0.252	0.0015	168
Bulgarian	Karelia-EHG	0.252	0.0012	212
Bulgarian	Kotias-CHG	0.252	0.0009	291
Bulgarian	Kalash	0.252	0.0008	304
Bulgarian	Bashkir	0.252	0.0007	371
Bulgarian	Pathan	0.253	0.0009	268
Bulgarian	Jordanian	0.253	0.0009	288
Bulgarian	Villabruna-UP-WHG	0.254	0.0010	256
Bulgarian	Turkmen	0.254	0.0009	291
Bulgarian	Balochi	0.254	0.0008	301
Bulgarian	Brahui	0.254	0.0007	363
Bulgarian	MA1-ANE	0.257	0.0009	274
Bulgarian	Punjabi	0.257	0.0009	296
Bulgarian	Yana-UP-WGS	0.258	0.0008	336
Bulgarian	Devils-Gate-N-WGS	0.259	0.0008	316
Bulgarian	Kolyma-Mesol-WGS	0.261	0.0011	240
Bulgarian	Saharawi	0.261	0.0010	261
Bulgarian	Eskimo-Sireniki	0.261	0.0008	324
Bulgarian	Eskimo-Chaplin	0.262	0.0011	237
Bulgarian	China-Tianyuan-UP	0.267	0.0012	215
Bulgarian	UstIshim-UP	0.269	0.0010	263
Bulgarian	Khomani-San	0.313	0.0013	245

Running F2s is simple. Do this

## Increase number of lines R prints
options(max.print = 100000)

extract_f2(pref, f2dir, pops = c(..........

f2_blocks = f2_from_precomp('............

##View(f2(f2_blocks))
print(f2(f2_blocks), n = 2000)

**Zoro** · 03-03-2021, 02:41 PM

Originally Posted by Kaspias

Tuscan is too Northern for the base Balkan admixture of Thrace, need something in between Apulia and Islander Greek instead.

Almost got no additional Slav:

The same model on me:

Code:

 target      left                     weight     se     z
                                 
1 Kaspias Hungary_Avar_5           0.0528 0.234  0.225
2 Kaspias Bulgaria_IA              0.607  0.191  3.18 
3 Kaspias Russia_Medieval_Nomad.SG 0.341  0.0677 5.03

 f4rank   dof  chisq        p dofdiff chisqdiff  p_nested
                       
1      2     7   4.68 6.98e- 1       9      28.3  8.44e- 4
2      1    16  33.0  7.39e- 3      11     298.   1.94e-57
3      0    27 331.   3.88e-54      NA      NA   NA

It looks like you're getting much closer. Your p-value is now passing at 6.98e- 1 which is basically 0.698 !

Your standard errors are not good though especially for Avar 1 Kaspias Hungary_Avar_5 0.0528 0.234 0.225 because it's saying 5.28% Avar +/-23.4%

All this means is your pright are not sufficient to distinguish the genetic difference between Hungary-Avar and Bulgaria-IA. Add a pright that you think is much genetically closer to Avar than Bulgaria-IA OR visa versa

**Korialstrasz** · 03-03-2021, 06:25 PM

@Kaspias

I am glad that my post helped. Nice to see that you too have managed to run it!

@Zoro

Very helpful advices all around. Thanks again.

---

So I made a few more runs (maxmiss=0 and 93k~ snps ) using the 1240K dataset and the following populations. I picked Tepecik for Neolithic Anatolia. Open for suggestions!

Code:

right= c('Russia_DevilsCave_N.SG','Switzerland_Bichon.SG','Morocco_Iberomaurusian','Turkey_TepecikCiftlik_N.SG','Georgia_Kotias.SG','Russia_HG_Karelia', 'Russia_Yana_UP.SG', 'Iran_GanjDareh_N', 'Russia_Kolyma_M.SG')

left = c("Bulgarian.DG","Adygei.DG","Turkmen.SG",'Georgian.DG','Greek_1.DG')

This seems to be the best result, standard errors can go lower I guess. The p values seem OK
About the z values corresponding to weight estimations: What is being tested here? weight i = 0 ? It seems like it.
Also, why do we want to fail to reject the model hypothesis? Can't seem to find a layman interpretation (no surprise).

Run 1: (Greek and Bulgarian did not go well together and Greek instead of Bulgarian yielded better results..Georgian seems to be a non-factor here: not significantly different than 0. But I would expect to have around 10%. Adygei on the other hand has a high se here, possibly due to its rather close proximity to Georgian.)

Code:

=======================================
  target    left     weight  se     z  
---------------------------------------
1   me    Adygei.DG  0.436  0.267 1.636
2   me   Turkmen.SG  0.051  0.049 1.055
3   me   Georgian.DG 0.053   0.2  0.263
4   me   Greek_1.DG   0.46  0.127 3.617
---------------------------------------

the p value = 0.56

====================================================
  f4rank dof  chisq   p   dofdiff chisqdiff p_nested
----------------------------------------------------
1   3     5   3.924  0.56    7     36.838      0    
2   2    12  40.761   0      9     101.281     0    
3   1    21  142.042  0     11     732.627     0    
4   0    32  874.67   0     NA       NA        NA   
----------------------------------------------------

Another run, without Georgian. (Adygei SE is now 0.15)

Code:

======================================
  target    left    weight  se     z  
--------------------------------------
1   me   Adygei.DG  0.489  0.15  3.268
2   me   Turkmen.SG 0.046  0.045 1.006
3   me   Greek_1.DG 0.466  0.13  3.584
--------------------------------------

=====================================================
  f4rank dof  chisq    p   dofdiff chisqdiff p_nested
-----------------------------------------------------
1   2     6   3.965  0.681    8      68.9       0    
2   1    14  72.865    0     10     653.942     0    
3   0    24  726.807   0     NA       NA        NA   
-----------------------------------------------------

bonus 1: me vs the populations I used (f2 statistics). If I am interpreting these correctly it says I am closer to Bulgarians than the Adygei (albeit not by a significant margin). On g25 I get the opposite all the time, with a clear margin.

Code:

====================================================================
   pop1            pop2              est     se        z        p   
--------------------------------------------------------------------
1   me         Bulgarian.DG         9e-04  0.0011   0.82034  0.41202
2   me          Adygei.DG          0.00134 0.00116  1.14901  0.25055
3   me         Georgian.DG         0.00232 0.00111  2.09104  0.03652
4   me          Greek_1.DG         0.00301 0.00148  2.03656  0.04169
5   me       Iran_GanjDareh_N      0.0566  0.00121 46.81045     0   
6   me          Turkmen.SG         0.06436 0.00126 51.18922     0   
7   me      Russia_Yana_UP.SG      0.08537 0.00142 60.25923     0   
8   me    Morocco_Iberomaurusian   0.09281 0.0014   66.0843     0   
9   me    Russia_DevilsCave_N.SG   0.10113 0.00151 66.87772     0   
10  me  Turkey_TepecikCiftlik_N.SG 0.13288 0.00139 95.47373     0   
11  me      Russia_HG_Karelia      0.16392 0.00155 106.00861    0   
12  me      Georgia_Kotias.SG      0.17455 0.00155 112.48209    0   
13  me    Switzerland_Bichon.SG    0.17911 0.00175 102.16275    0   
14  me      Russia_Kolyma_M.SG     0.19436 0.00177 109.87032    0   
--------------------------------------------------------------------

bonus 2: a graph. Perhaps this can give an idea as to whether the chosen populations are satisfactory or not. If the graphs produce nonsense, one can try different populations. This particular one I produced is possibly nonsense since I only used some moderns and pretty ancient populations.

**Zoro** · 03-03-2021, 06:54 PM

Originally Posted by Korialstrasz

@Kaspias

I am glad that my post helped. Nice to see that you too have managed to run it!

@Zoro

Very helpful advices all around. Thanks again.

---

So I made a few more runs (maxmiss=0 and 93k~ snps ) using the 1240K dataset and the following populations. I picked Tepecik for Neolithic Anatolia. Open for suggestions!

Code:

right= c('Russia_DevilsCave_N.SG','Switzerland_Bichon.SG','Morocco_Iberomaurusian','Turkey_TepecikCiftlik_N.SG','Georgia_Kotias.SG','Russia_HG_Karelia', 'Russia_Yana_UP.SG', 'Iran_GanjDareh_N', 'Russia_Kolyma_M.SG')

left = c("Bulgarian.DG","Adygei.DG","Turkmen.SG",'Georgian.DG','Greek_1.DG')

This seems to be the best result, standard errors can go lower I guess. The p values seem OK
About the z values corresponding to weight estimations: What is being tested here? weight i = 0 ? It seems like it.
Also, why do we want to fail to reject the model hypothesis? Can't seem to find a layman interpretation (no surprise).

Run 1: (Greek and Bulgarian did not go well together and Greek instead of Bulgarian yielded better results..Georgian seems to be a non-factor here: not significantly different than 0. But I would expect to have around 10%. Adygei on the other hand has a high se here, possibly due to its rather close proximity to Georgian.)

Code:

=======================================
  target    left     weight  se     z  
---------------------------------------
1   me    Adygei.DG  0.436  0.267 1.636
2   me   Turkmen.SG  0.051  0.049 1.055
3   me   Georgian.DG 0.053   0.2  0.263
4   me   Greek_1.DG   0.46  0.127 3.617
---------------------------------------

the p value = 0.56

====================================================
  f4rank dof  chisq   p   dofdiff chisqdiff p_nested
----------------------------------------------------
1   3     5   3.924  0.56    7     36.838      0    
2   2    12  40.761   0      9     101.281     0    
3   1    21  142.042  0     11     732.627     0    
4   0    32  874.67   0     NA       NA        NA   
----------------------------------------------------

Another run, without Georgian. (Adygei SE is now 0.15)

Code:

======================================
  target    left    weight  se     z  
--------------------------------------
1   me   Adygei.DG  0.489  0.15  3.268
2   me   Turkmen.SG 0.046  0.045 1.006
3   me   Greek_1.DG 0.466  0.13  3.584
--------------------------------------

=====================================================
  f4rank dof  chisq    p   dofdiff chisqdiff p_nested
-----------------------------------------------------
1   2     6   3.965  0.681    8      68.9       0    
2   1    14  72.865    0     10     653.942     0    
3   0    24  726.807   0     NA       NA        NA   
-----------------------------------------------------

bonus 1: me vs the populations I used (f2 statistics). If I am interpreting these correctly it says I am closer to Bulgarians than the Adygei (albeit not by a significant margin). On g25 I get the opposite all the time, with a clear margin.

Code:

====================================================================
   pop1            pop2              est     se        z        p   
--------------------------------------------------------------------
1   me         Bulgarian.DG         9e-04  0.0011   0.82034  0.41202
2   me          Adygei.DG          0.00134 0.00116  1.14901  0.25055
3   me         Georgian.DG         0.00232 0.00111  2.09104  0.03652
4   me          Greek_1.DG         0.00301 0.00148  2.03656  0.04169
5   me       Iran_GanjDareh_N      0.0566  0.00121 46.81045     0   
6   me          Turkmen.SG         0.06436 0.00126 51.18922     0   
7   me      Russia_Yana_UP.SG      0.08537 0.00142 60.25923     0   
8   me    Morocco_Iberomaurusian   0.09281 0.0014   66.0843     0   
9   me    Russia_DevilsCave_N.SG   0.10113 0.00151 66.87772     0   
10  me  Turkey_TepecikCiftlik_N.SG 0.13288 0.00139 95.47373     0   
11  me      Russia_HG_Karelia      0.16392 0.00155 106.00861    0   
12  me      Georgia_Kotias.SG      0.17455 0.00155 112.48209    0   
13  me    Switzerland_Bichon.SG    0.17911 0.00175 102.16275    0   
14  me      Russia_Kolyma_M.SG     0.19436 0.00177 109.87032    0   
--------------------------------------------------------------------

]

Congrats, looking good.

The best way to figure out which samples have the least missing SNPs so that you can use them in your run is to do this plink command:

....../plink/bfile Master --missing . This will output a file called plink.imiss and will list the number of missing SNPs in every sample. This way you can only use your best samples.

For ex, here's a portion of my plink.imiss file sorted by missingness

Anatolia_N	Bar8	3563703	4668444	0.76
Anatolia_N	Bar31	3646992	4676043	0.78
Anatolia_N	I0707	3712565	4668444	0.80
Anatolia_N	I0746	3726082	4676043	0.80
Anatolia_N	I0745	3732807	4676043	0.80
Anatolia_N	I0709	3741514	4676043	0.80
Anatolia_N	I0708	3748125	4676043	0.80
Anatolia_N	I1583_publ	3749842	4676043	0.80
Anatolia_N	I1580_publ	3798544	4668444	0.81
Anatolia_N	I0744	3837576	4676043	0.82
Anatolia_N	I1581_publ	3874657	4668444	0.83
Anatolia_N	I1585_publ	3875085	4668444	0.83
Anatolia_N	I1579_publ	3880059	4668444	0.83
Anatolia_N	I0736	3898059	4668444	0.84
Anatolia_N	I1098	3914326	4668444	0.84
Anatolia_N	ZHAG	3921509	4668444	0.84
Anatolia_N	I1096	3957126	4676043	0.85
Anatolia_N	I1097	3959996	4676043	0.85
Anatolia_N	I1101	4047243	4676043	0.87
Anatolia_N	I1103	4099354	4676043	0.88
Anatolia_Ottoman_1.SG	MA2195_final	4109325	4668444	0.88
Anatolia_TepecikCiftlik_N.SG	Tep003	4141647	4676043	0.89

You'll notice the best ENF samples are Bar8 with missingness of only 0.76 followed by Bar31 etc. You'll also notice that the ENF you used is one of the worst as far as missing SNPs at missingness of 0.89

Next what I do is go to my Eigenstrat .ind file and add _low to the samples with high missingness that I don't want Admixtools to use.

For ex
Anatolia_N:Bar8 F Anatolia_N
Anatolia_N:Bar31 M Anatolia_N
Anatolia_N:I0707 F Anatolia_N
Anatolia_N:I0708 M Anatolia_N
Anatolia_N:I0709 M Anatolia_N
Anatolia_N:I0736 F Anatolia_N_low
Anatolia_N:I0744 M Anatolia_N
Anatolia_N:I0745 M Anatolia_N
Anatolia_N:I0746 M Anatolia_N
Anatolia_N:I1096 M Anatolia_N_low
Anatolia_N:I1097 M Anatolia_N_low
Anatolia_N:I1098 F Anatolia_N_low
Anatolia_N:I1101 M Anatolia_N_low
Anatolia_N:I1103 M Anatolia_N_low
Anatolia_N:I1579_publ F Anatolia_N
Anatolia_N:I1580_publ F Anatolia_N
Anatolia_N:I1581_publ F Anatolia_N
Anatolia_N:I1583_publ M Anatolia_N
Anatolia_N:I1585_publ F Anatolia_N
Anatolia_N:ZHAG F Anatolia_N_low

Now when I add "Anatolia_N" to extract or pright only the ENF samples with low missingness are used and the rest are ignored.

You may ask why I don't only use the best 2 ENF samples instead of the best 8. The answer to that is the more samples the more accurate the allele frequencies for the population become. So its a tradeoff between ignoring worse samples and improving allele frequencies.

Thread: qpAdm thread

Thread Tools

Thread Information

Users Browsing this Thread

Similar Threads

[qpAdm] Someone know how to use it?

qpAdm modelling, first attempt

Bookmarks

Bookmarks

Posting Permissions