View Full Version : BCE G30 beta - PCA with 6000 samples
vbnetkhio
02-05-2020, 08:37 PM
I made a G25 inspired tool. it includes around 6000 modern samples and around 100 ancient.
if somebody's interested, you can test it on Vahaduo with the samples from the spreadsheet and post the results here.
https://vahaduo.github.io/vahaduo/
spreadsheet download:
https://drive.google.com/open?id=1q4B_KHZoL3jOO9xhM2pwFp8zJXKqmo0S
vbnetkhio
02-05-2020, 08:38 PM
i also made a R oracle:
https://drive.google.com/open?id=18ZO3LKzor5bGNK86k3bPYRmu3LnDvXgN
instructions for using the R oracle, same as for this one:
https://www.theapricity.com/forum/showthread.php?307588-g25-all-in-one-unscaled-oracle
vbnetkhio
02-05-2020, 09:01 PM
here's another version with less ancient samples, but it seems more accurate
https://drive.google.com/open?id=1ITqzJmV9gSZlKVLIHPFXgd34QrYDxG8I
https://drive.google.com/open?id=12O4AQ53RpJuzoroSoq8_7d2wksY8bnKr
results of a Serbian sample:
[1,] "Serbian_Serbia1_5624" "0"
[2,] "Bosnian_16_3687" "0.0092"
[3,] "Croatia_Cro198_3868" "0.0103"
[4,] "Bosnian_10_3681" "0.0108"
[5,] "Serbian_Serbia18_5636" "0.011"
[6,] "Serbian_Serbia5_5628" "0.011"
[7,] "Montenegro9_5262" "0.0111"
[8,] "Macedonian9_5168" "0.0116"
[9,] "hungary9_4724" "0.0116"
[10,] "Croatia_B-H1H_3855" "0.0119"
[11,] "Bosnian_12_3683" "0.0119"
[12,] "Serbian_Serbia2_5625" "0.012"
[13,] "hungary7_4722" "0.012"
[14,] "Bosnian_5_3677" "0.012"
[15,] "Serbian_Serbia3_5626" "0.0123"
[16,] "Bosnian_14_3685" "0.0123"
[17,] "Bosnian_4_3676" "0.0123"
[18,] "Montenegro5_5259" "0.0124"
[19,] "hungary14_4729" "0.0125"
[20,] "Montenegro10_5263" "0.0125"
[1,] "68.3 % Bosnian_16_3687 + 31.7 % hungary16_4731" "0.0074"
[2,] "5.3 % NA19147.SG_YRI.SG_Yoruba_Ibadan_Nigeria_2871 + 94.7 % Bosnian_16_3687" "0.0078"
[3,] "43 % NA20799.SG_TSI.SG_Tuscany_Italy_3140 + 57 % Poland137_5461" "0.0079"
[4,] "64.1 % Bosnian_16_3687 + 35.9 % hungary9_4724" "0.0079"
[5,] "17.3 % NA12828.SG_CEU.SG_Utah_USA_2562 + 82.7 % Bosnian_16_3687" "0.0079"
[6,] "58.2 % Bosnian_16_3687 + 41.8 % Croatia_Cro198_3868" "0.0079"
[7,] "53.7 % Croatia_Cro198_3868 + 46.3 % Serbian_Serbia18_5636" "0.008"
[8,] "4.5 % NA19184.SG_YRI.SG_Yoruba_Ibadan_Nigeria_2880 + 95.5 % Bosnian_16_3687" "0.0081"
[9,] "46.2 % hungary20_4735 + 53.8 % Montenegro10_5263" "0.0081"
[10,] "40.1 % hungary16_4731 + 59.9 % Montenegro9_5262" "0.0081"
[11,] "10.2 % HG00121.SGGBR.SG.._Great_Britain_1207 + 89.8 % Bosnian_16_3687" "0.0081"
[12,] "42 % Croatia_Cro344_3872 + 58 % hungary20_4735" "0.0081"
[13,] "70.8 % Croatia_Cro198_3868 + 29.2 % Croatia_Cro344_3872" "0.0081"
[14,] "44 % NA20799.SG_TSI.SG_Tuscany_Italy_3140 + 56 % Pole1" "0.0081"
[15,] "68.6 % Bosnian_16_3687 + 31.4 % hungary14_4729" "0.0082"
[16,] "76.3 % Bosnian_16_3687 + 23.7 % Ger2" "0.0082"
[17,] "14.4 % vik_stg021_Sweden_Viking.SG_924_ybp_3410 + 85.6 % Bosnian_16_3687" "0.0082"
[18,] "54.5 % Croatia_Cro198_3868 + 45.5 % Montenegro9_5262" "0.0082"
[19,] "51.6 % Bulgaria8_3699 + 48.4 % hungary20_4735" "0.0082"
[20,] "43.5 % belarusian47zp_3658 + 56.5 % GreeceThessaly3_4264" "0.0082"
MagnusDark
02-05-2020, 09:12 PM
......
Could you run my coordinates if it's no trouble?
MagnusDark_scaled,0.127482,0.150298,0.020365,-0.02584,0.028621,-0.007251,0.00611,0.002769,-0.002863,0.022233,0.001624,0.007343,-0.011596,0.003165,-0.02158,0.006895,0.030379,-0.00266,0.006159,-0.007379,-0.010606,0.000124,0.000986,0.000964,-0.002036
MagnusDark,0.0112,0.0148,0.0054,-0.008,0.0093,-0.0026,0.0026,0.0012,-0.0014,0.0122,0.001,0.0049,-0.0078,0.0023,-0.0159,0.0052,0.0233,-0.0021,0.0049,-0.0059,-0.0085,0.0001,0.0008,0.0008,-0.0017
vbnetkhio
02-05-2020, 09:13 PM
Could you run my coordinates if it's no trouble?
MagnusDark_scaled,0.127482,0.150298,0.020365,-0.02584,0.028621,-0.007251,0.00611,0.002769,-0.002863,0.022233,0.001624,0.007343,-0.011596,0.003165,-0.02158,0.006895,0.030379,-0.00266,0.006159,-0.007379,-0.010606,0.000124,0.000986,0.000964,-0.002036
MagnusDark,0.0112,0.0148,0.0054,-0.008,0.0093,-0.0026,0.0026,0.0012,-0.0014,0.0122,0.001,0.0049,-0.0078,0.0023,-0.0159,0.0052,0.0233,-0.0021,0.0049,-0.0059,-0.0085,0.0001,0.0008,0.0008,-0.0017
it's not g25, it's its' own tool
firemonkey
02-05-2020, 11:12 PM
How the hell does this work? Where do we get the 30 co-ordinates from?
How the hell does this work? Where do we get the 30 co-ordinates from?
€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.
Wait, do you brits still use Euros? if not then US dollars please
Kaspias
02-06-2020, 08:19 AM
Nice work
Bosniensis
02-06-2020, 08:24 AM
I made a G25 inspired tool. it includes around 6000 modern samples and around 100 ancient.
if somebody's interested, you can test it on Vahaduo with the samples from the spreadsheet and post the results here.
https://vahaduo.github.io/vahaduo/
spreadsheet download:
https://drive.google.com/open?id=1q4B_KHZoL3jOO9xhM2pwFp8zJXKqmo0S
howtfthis works :mad:
errors everywhere on vahduhuoho
vbnetkhio
02-06-2020, 08:51 AM
howtfthis works :mad:
errors everywhere on vahduhuoho
it's in the testing phase , dr. Bosniensis :D
you can test the Bosnian samples from the sheet and see what they get
Lucas
02-06-2020, 10:04 AM
€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.
Wait, do you brits still use Euros? if not then US dollars please
So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)
vbnetkhio
02-06-2020, 10:20 AM
So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)
dick is kidding, i just made this for fun.
also the second version gives very nice results for the Serb, like k13 or g25 2way oracle, Polish/Belarusian + Italian/Greek
Lucas
02-06-2020, 10:23 AM
dick is kidding, i just made this for fun.
also the second version gives very nice results for the Serb, like k13 or g25 2way oracle, Polish/Belarusian + Italian/Greek
Ok:)
Interesting if you make not 30 but 25 component PCA and if those coordinates would work in G25 more or less. Some time ago I think about such experiment but I don't have time.
You can check it.
vbnetkhio
02-06-2020, 10:26 AM
Ok:)
Interesting if you make not 30 but 25 component PCA and if those coordinates would work in G25 more or less. Some time ago I think about such experiment but I don't have time.
You can check it.
smartpca takes 30-60 minutes, even with this many samples. i wanted to make an admixture calculator, but it would take months to finish on my pc.
Lucas
02-06-2020, 10:29 AM
smartpca takes 30-60 minutes, even with this many samples. i wanted to make an admixture calculator, but it would take months to finish on my pc.
So can you check if it is possible?
vbnetkhio
02-06-2020, 10:39 AM
So can you check if it is possible?
ok, i can run the second version, with less ancient samples, it's more similar to g25.
but i think i would have to use exact same samples davidski used for it to work.
Lucas
02-06-2020, 10:40 AM
ok, i can run the second version, with less ancient samples, it's more similar to g25.
but i think i would have to use exact same samples davidski used for it to work.
Yes, to be 100% exact. But maybe results would be let's say 80% similar.
vbnetkhio
02-06-2020, 10:41 AM
https://i.imgur.com/klZ3QZE.png
classic model for a Serbian sample
it's similar to g25 Serbian results (Decius)
https://www.theapricity.com/forum/showthread.php?302219-My-G25-results&p=6258846&viewfull=1#post6258846
vbnetkhio
02-06-2020, 11:07 AM
So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)
it's HGDP, reich human origins(which includes 1000 genomes and sgdp), and everything from the Estonian biocentre, not just some samples.
and i removed those tri-racial samples from Argentina, Peru, Puerto Rico, Afro-Americans, Afro-Carribeans, chimp, gorilla(lol) and all ancient samples with less than 90% snps.
Bosniensis
02-06-2020, 11:11 AM
Man I love these calcs.
I hope to try at least 500 more
I'll make Calc as well
ph2ter
02-06-2020, 11:43 AM
More coordinates = more computing power needed
=not very practical
Lucas
02-06-2020, 11:44 AM
More coordinates = more computing power needed
=not very practical
I guess it was a reason why Davidski left only most representative samples for each population not all possible, besides pruning obvious outliers.
firemonkey
02-06-2020, 01:00 PM
€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.
Wait, do you brits still use Euros? if not then US dollars please
Is vbnetkhio you using an alternative account then ? Or should there be a comma before the second 'and' ? I have no idea as to what email address to send the payment to.
Lucas
02-06-2020, 01:08 PM
Is vbnetkhio you using an alternative account then ? Or should there be a comma before the second 'and' ? I have no idea as to what email address to send the payment to.
Read this post https://www.theapricity.com/forum/showthread.php?314681-BCE-G30-beta-PCA-with-6000-samples&p=6488194&viewfull=1#post6488194
firemonkey
02-06-2020, 01:56 PM
Read this post https://www.theapricity.com/forum/showthread.php?314681-BCE-G30-beta-PCA-with-6000-samples&p=6488194&viewfull=1#post6488194
So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .
So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .
It’s not the length that counts, it’s the girth. Said no woman ever.
vbnetkhio
02-06-2020, 02:41 PM
More coordinates = more computing power needed
=not very practical
100 PCs took an hour and half or so to calculate on my weak pc. 30 PCs around half an hour. it's very fast.
vbnetkhio
02-06-2020, 02:47 PM
So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .
actually, nobody is making new calcs lately. the last really good one was eurogenes k13, which came out more than 10 years ago.
loads of new data have been published since k13, and nobody is taking use of it.
i wanted to see if anything changes when i include that data, and can it be pushed further than k13.
only Supreeeeme, Tolan and I made some calculators in the last year. out of the projects on Gedmatch, only Eurogenes is still active. how is that a boom
vbnetkhio
02-07-2020, 03:33 PM
https://i.imgur.com/gl7bDte.png
ERROR! Column number mismatch.
I pasted the spreasheets and my coords, won't work for me.
celticdragongod
02-07-2020, 09:48 PM
https://i.imgur.com/gl7bDte.png
Where are the Celts?
vbnetkhio
02-07-2020, 10:16 PM
Where are the Celts?
Welsh and Orcadians are between British and Germanic. more northern shifted than most of the "British" samples.
Germanic is mostly Swedes , most Germans are down there with French, maybe i should have called it Scandinavian
Powered by vBulletin® Version 4.2.3 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.