PDA

View Full Version : BCE G30 beta - PCA with 6000 samples



vbnetkhio
02-05-2020, 08:37 PM
I made a G25 inspired tool. it includes around 6000 modern samples and around 100 ancient.
if somebody's interested, you can test it on Vahaduo with the samples from the spreadsheet and post the results here.

https://vahaduo.github.io/vahaduo/

spreadsheet download:
https://drive.google.com/open?id=1q4B_KHZoL3jOO9xhM2pwFp8zJXKqmo0S

vbnetkhio
02-05-2020, 08:38 PM
i also made a R oracle:
https://drive.google.com/open?id=18ZO3LKzor5bGNK86k3bPYRmu3LnDvXgN

instructions for using the R oracle, same as for this one:
https://www.theapricity.com/forum/showthread.php?307588-g25-all-in-one-unscaled-oracle

vbnetkhio
02-05-2020, 09:01 PM
here's another version with less ancient samples, but it seems more accurate

https://drive.google.com/open?id=1ITqzJmV9gSZlKVLIHPFXgd34QrYDxG8I
https://drive.google.com/open?id=12O4AQ53RpJuzoroSoq8_7d2wksY8bnKr

results of a Serbian sample:


[1,] "Serbian_Serbia1_5624" "0"
[2,] "Bosnian_16_3687" "0.0092"
[3,] "Croatia_Cro198_3868" "0.0103"
[4,] "Bosnian_10_3681" "0.0108"
[5,] "Serbian_Serbia18_5636" "0.011"
[6,] "Serbian_Serbia5_5628" "0.011"
[7,] "Montenegro9_5262" "0.0111"
[8,] "Macedonian9_5168" "0.0116"
[9,] "hungary9_4724" "0.0116"
[10,] "Croatia_B-H1H_3855" "0.0119"
[11,] "Bosnian_12_3683" "0.0119"
[12,] "Serbian_Serbia2_5625" "0.012"
[13,] "hungary7_4722" "0.012"
[14,] "Bosnian_5_3677" "0.012"
[15,] "Serbian_Serbia3_5626" "0.0123"
[16,] "Bosnian_14_3685" "0.0123"
[17,] "Bosnian_4_3676" "0.0123"
[18,] "Montenegro5_5259" "0.0124"
[19,] "hungary14_4729" "0.0125"
[20,] "Montenegro10_5263" "0.0125"

[1,] "68.3 % Bosnian_16_3687 + 31.7 % hungary16_4731" "0.0074"
[2,] "5.3 % NA19147.SG_YRI.SG_Yoruba_Ibadan_Nigeria_2871 + 94.7 % Bosnian_16_3687" "0.0078"
[3,] "43 % NA20799.SG_TSI.SG_Tuscany_Italy_3140 + 57 % Poland137_5461" "0.0079"
[4,] "64.1 % Bosnian_16_3687 + 35.9 % hungary9_4724" "0.0079"
[5,] "17.3 % NA12828.SG_CEU.SG_Utah_USA_2562 + 82.7 % Bosnian_16_3687" "0.0079"
[6,] "58.2 % Bosnian_16_3687 + 41.8 % Croatia_Cro198_3868" "0.0079"
[7,] "53.7 % Croatia_Cro198_3868 + 46.3 % Serbian_Serbia18_5636" "0.008"
[8,] "4.5 % NA19184.SG_YRI.SG_Yoruba_Ibadan_Nigeria_2880 + 95.5 % Bosnian_16_3687" "0.0081"
[9,] "46.2 % hungary20_4735 + 53.8 % Montenegro10_5263" "0.0081"
[10,] "40.1 % hungary16_4731 + 59.9 % Montenegro9_5262" "0.0081"
[11,] "10.2 % HG00121.SGGBR.SG.._Great_Britain_1207 + 89.8 % Bosnian_16_3687" "0.0081"
[12,] "42 % Croatia_Cro344_3872 + 58 % hungary20_4735" "0.0081"
[13,] "70.8 % Croatia_Cro198_3868 + 29.2 % Croatia_Cro344_3872" "0.0081"
[14,] "44 % NA20799.SG_TSI.SG_Tuscany_Italy_3140 + 56 % Pole1" "0.0081"
[15,] "68.6 % Bosnian_16_3687 + 31.4 % hungary14_4729" "0.0082"
[16,] "76.3 % Bosnian_16_3687 + 23.7 % Ger2" "0.0082"
[17,] "14.4 % vik_stg021_Sweden_Viking.SG_924_ybp_3410 + 85.6 % Bosnian_16_3687" "0.0082"
[18,] "54.5 % Croatia_Cro198_3868 + 45.5 % Montenegro9_5262" "0.0082"
[19,] "51.6 % Bulgaria8_3699 + 48.4 % hungary20_4735" "0.0082"
[20,] "43.5 % belarusian47zp_3658 + 56.5 % GreeceThessaly3_4264" "0.0082"

MagnusDark
02-05-2020, 09:12 PM
......

Could you run my coordinates if it's no trouble?

MagnusDark_scaled,0.127482,0.150298,0.020365,-0.02584,0.028621,-0.007251,0.00611,0.002769,-0.002863,0.022233,0.001624,0.007343,-0.011596,0.003165,-0.02158,0.006895,0.030379,-0.00266,0.006159,-0.007379,-0.010606,0.000124,0.000986,0.000964,-0.002036

MagnusDark,0.0112,0.0148,0.0054,-0.008,0.0093,-0.0026,0.0026,0.0012,-0.0014,0.0122,0.001,0.0049,-0.0078,0.0023,-0.0159,0.0052,0.0233,-0.0021,0.0049,-0.0059,-0.0085,0.0001,0.0008,0.0008,-0.0017

vbnetkhio
02-05-2020, 09:13 PM
Could you run my coordinates if it's no trouble?

MagnusDark_scaled,0.127482,0.150298,0.020365,-0.02584,0.028621,-0.007251,0.00611,0.002769,-0.002863,0.022233,0.001624,0.007343,-0.011596,0.003165,-0.02158,0.006895,0.030379,-0.00266,0.006159,-0.007379,-0.010606,0.000124,0.000986,0.000964,-0.002036

MagnusDark,0.0112,0.0148,0.0054,-0.008,0.0093,-0.0026,0.0026,0.0012,-0.0014,0.0122,0.001,0.0049,-0.0078,0.0023,-0.0159,0.0052,0.0233,-0.0021,0.0049,-0.0059,-0.0085,0.0001,0.0008,0.0008,-0.0017

it's not g25, it's its' own tool

firemonkey
02-05-2020, 11:12 PM
How the hell does this work? Where do we get the 30 co-ordinates from?

Dick
02-05-2020, 11:15 PM
How the hell does this work? Where do we get the 30 co-ordinates from?

€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.

Wait, do you brits still use Euros? if not then US dollars please

Kaspias
02-06-2020, 08:19 AM
Nice work

Bosniensis
02-06-2020, 08:24 AM
I made a G25 inspired tool. it includes around 6000 modern samples and around 100 ancient.
if somebody's interested, you can test it on Vahaduo with the samples from the spreadsheet and post the results here.

https://vahaduo.github.io/vahaduo/

spreadsheet download:
https://drive.google.com/open?id=1q4B_KHZoL3jOO9xhM2pwFp8zJXKqmo0S

howtfthis works :mad:

errors everywhere on vahduhuoho

vbnetkhio
02-06-2020, 08:51 AM
howtfthis works :mad:

errors everywhere on vahduhuoho

it's in the testing phase , dr. Bosniensis :D
you can test the Bosnian samples from the sheet and see what they get

Lucas
02-06-2020, 10:04 AM
€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.

Wait, do you brits still use Euros? if not then US dollars please

So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)

vbnetkhio
02-06-2020, 10:20 AM
So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)

dick is kidding, i just made this for fun.

also the second version gives very nice results for the Serb, like k13 or g25 2way oracle, Polish/Belarusian + Italian/Greek

Lucas
02-06-2020, 10:23 AM
dick is kidding, i just made this for fun.

also the second version gives very nice results for the Serb, like k13 or g25 2way oracle, Polish/Belarusian + Italian/Greek

Ok:)

Interesting if you make not 30 but 25 component PCA and if those coordinates would work in G25 more or less. Some time ago I think about such experiment but I don't have time.

You can check it.

vbnetkhio
02-06-2020, 10:26 AM
Ok:)

Interesting if you make not 30 but 25 component PCA and if those coordinates would work in G25 more or less. Some time ago I think about such experiment but I don't have time.

You can check it.

smartpca takes 30-60 minutes, even with this many samples. i wanted to make an admixture calculator, but it would take months to finish on my pc.

Lucas
02-06-2020, 10:29 AM
smartpca takes 30-60 minutes, even with this many samples. i wanted to make an admixture calculator, but it would take months to finish on my pc.

So can you check if it is possible?

vbnetkhio
02-06-2020, 10:39 AM
So can you check if it is possible?

ok, i can run the second version, with less ancient samples, it's more similar to g25.
but i think i would have to use exact same samples davidski used for it to work.

Lucas
02-06-2020, 10:40 AM
ok, i can run the second version, with less ancient samples, it's more similar to g25.
but i think i would have to use exact same samples davidski used for it to work.

Yes, to be 100% exact. But maybe results would be let's say 80% similar.

vbnetkhio
02-06-2020, 10:41 AM
https://i.imgur.com/klZ3QZE.png

classic model for a Serbian sample

it's similar to g25 Serbian results (Decius)

https://www.theapricity.com/forum/showthread.php?302219-My-G25-results&p=6258846&viewfull=1#post6258846

vbnetkhio
02-06-2020, 11:07 AM
So putting together whole old datasets of HGDP and 1000 Genomes and few others from Estonian Biocentre without pruning outliers (think why Davidski doesn't use all those samples, because it's useless) will be competition to G25 coordinates which encompass all possible modern and ancient datasets? :)

it's HGDP, reich human origins(which includes 1000 genomes and sgdp), and everything from the Estonian biocentre, not just some samples.
and i removed those tri-racial samples from Argentina, Peru, Puerto Rico, Afro-Americans, Afro-Carribeans, chimp, gorilla(lol) and all ancient samples with less than 90% snps.

Bosniensis
02-06-2020, 11:11 AM
Man I love these calcs.

I hope to try at least 500 more

I'll make Calc as well

ph2ter
02-06-2020, 11:43 AM
More coordinates = more computing power needed

=not very practical

Lucas
02-06-2020, 11:44 AM
More coordinates = more computing power needed

=not very practical

I guess it was a reason why Davidski left only most representative samples for each population not all possible, besides pruning obvious outliers.

firemonkey
02-06-2020, 01:00 PM
€10 via Paypal and vbnetkhio and I will add more co-ordinates for you.

Wait, do you brits still use Euros? if not then US dollars please

Is vbnetkhio you using an alternative account then ? Or should there be a comma before the second 'and' ? I have no idea as to what email address to send the payment to.

Lucas
02-06-2020, 01:08 PM
Is vbnetkhio you using an alternative account then ? Or should there be a comma before the second 'and' ? I have no idea as to what email address to send the payment to.

Read this post https://www.theapricity.com/forum/showthread.php?314681-BCE-G30-beta-PCA-with-6000-samples&p=6488194&viewfull=1#post6488194

firemonkey
02-06-2020, 01:56 PM
Read this post https://www.theapricity.com/forum/showthread.php?314681-BCE-G30-beta-PCA-with-6000-samples&p=6488194&viewfull=1#post6488194

So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .

Dick
02-06-2020, 02:03 PM
So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .

It’s not the length that counts, it’s the girth. Said no woman ever.

vbnetkhio
02-06-2020, 02:41 PM
More coordinates = more computing power needed

=not very practical

100 PCs took an hour and half or so to calculate on my weak pc. 30 PCs around half an hour. it's very fast.

vbnetkhio
02-06-2020, 02:47 PM
So basically it's a load of crap . The impression I get is that there's been a boom in people wanting to develop calculators , with it being a case of "Never mind the quality feel the width" .

actually, nobody is making new calcs lately. the last really good one was eurogenes k13, which came out more than 10 years ago.

loads of new data have been published since k13, and nobody is taking use of it.
i wanted to see if anything changes when i include that data, and can it be pushed further than k13.

only Supreeeeme, Tolan and I made some calculators in the last year. out of the projects on Gedmatch, only Eurogenes is still active. how is that a boom

vbnetkhio
02-07-2020, 03:33 PM
https://i.imgur.com/gl7bDte.png

Leto
02-07-2020, 06:04 PM
ERROR! Column number mismatch.

I pasted the spreasheets and my coords, won't work for me.

celticdragongod
02-07-2020, 09:48 PM
https://i.imgur.com/gl7bDte.png

Where are the Celts?

vbnetkhio
02-07-2020, 10:16 PM
Where are the Celts?

Welsh and Orcadians are between British and Germanic. more northern shifted than most of the "British" samples.
Germanic is mostly Swedes , most Germans are down there with French, maybe i should have called it Scandinavian