Quote:
Originally Posted by
Lucas
I said it before. During such big runs always few random samples have strange results like this. Try to check them again separately, should be normal. And I am sure they are not references in old K13 it was different survey so it is not calc effect.
BTW Vologda Russians are references everywhere so they are score those high values for real unlike those new Lithuanians.
Actually I think you need to make the tolerance parameter (`-t`) smaller (https://github.com/stevenliuyi/admix#faq):
This package utilizes the optimization function `scipy.optimize.minimize` from the SciPy library, which has a parameter `tol` to control the tolerance for termination of the optimizer. The default tolerance is set to `1e-3` here. It works most of time, but sometimes `1e-3` is too big and causes early termination. You can manually set a smaller tolerance (say `1e-4`) to obtain correct results, although it will take longer to run the optimizer.
michal3141 also got the wrong results in K36 by using the default threshold, and he said that "probably e-7 or at least e-6 is the way to go": https://anthrogenica.com/showthread....l=1#post810669.
When I tried doing another run of one of the samples from Vologda that got over 99% Baltic (HGDP00899), it first got the same results, because the algorithm doesn't use a random seed like ADMIXTURE, but it gives the same result each time. After I decreased the tolerance to 1e-4, the results started to resemble other samples from Vologda. Between 1e-4 and 1e-5, some admixture proportions still changed by more than 0.3 percentage points, but there was no further change between 1e-5 and 1e-6:
Code:
~ admix -f a.txt -mK13
Admixture calculation models: K13
Calcuation is started...
K13
North_Atlantic: 0.12%
Baltic: 99.76%
West_Med: 0.00%
West_Asian: 0.00%
East_Med: 0.00%
Red_Sea: 0.00%
South_Asian: 0.00%
East_Asian: 0.00%
Siberian: 0.07%
Amerindian: 0.00%
Oceanian: 0.05%
Northeast_African: 0.00%
Sub-Saharan: 0.00%
~ admix -f a.txt -mK13 -t1e-4
Admixture calculation models: K13
Calcuation is started...
K13
North_Atlantic: 3.95%
Baltic: 71.60%
West_Med: 5.06%
West_Asian: 2.74%
East_Med: 6.62%
Red_Sea: 0.67%
South_Asian: 0.00%
East_Asian: 0.16%
Siberian: 6.24%
Amerindian: 1.26%
Oceanian: 1.31%
Northeast_African: 0.04%
Sub-Saharan: 0.36%
~ admix -f a.txt -mK13 -t1e-5
Admixture calculation models: K13
Calcuation is started...
K13
North_Atlantic: 4.09%
Baltic: 71.69%
West_Med: 4.73%
West_Asian: 2.64%
East_Med: 6.46%
Red_Sea: 0.91%
South_Asian: 0.00%
East_Asian: 0.11%
Siberian: 6.05%
Amerindian: 1.37%
Oceanian: 1.55%
Northeast_African: 0.03%
Sub-Saharan: 0.36%
~ admix -f a.txt -mK13 -t1e-6
Admixture calculation models: K13
Calcuation is started...
K13
North_Atlantic: 4.09%
Baltic: 71.69%
West_Med: 4.73%
West_Asian: 2.64%
East_Med: 6.46%
Red_Sea: 0.91%
South_Asian: 0.00%
East_Asian: 0.11%
Siberian: 6.05%
Amerindian: 1.37%
Oceanian: 1.55%
Northeast_African: 0.03%
Sub-Saharan: 0.36%
I ran K13 at different tolerance parameters for all 87 samples from the Reich dataset with the population name "Russian", because they included some problematic samples in my earlier K13 run. Now the average difference in the admixture percentages became less than 0.01 between 1e-4 and 1e-5, and less than 0.0001 between 1e-5 and 1e-6:
Tolerance |
Average difference in admixture percentages
compared to previous tolerance |
Running time
per sample |
1e-1 |
- |
1.22 |
1e-2 |
3.144810 |
1.65 |
1e-3 |
3.515349 |
3.12 |
1e-4 |
0.316737 |
3.47 |
1e-5 |
0.002387 |
3.64 |
1e-6 |
0.000097 |
3.51 |
1e-7 |
0.000062 |
3.65 |
1e-8 |
0.000088 |
3.65 |
So it's probably better to change the tolerance to at least 1e-5, even though in this case even 1e-8 was about as fast.