View Full Version : G25 is not that good...?
The PCA-driven Eurogenes G25 calculator has been one of the topics of great interest here in this forum when talking about autosomal tests and, because of that, there was the need of a research looking at the method used by G25 and replicating it for testing accuracy.
METHODOLOGY
Using PLINK, I made a PCA with the samples gathered from the Estonian Biocentre (for European, Asian and Amerindian samples) and Henn et al. (for African samples).
The PCA consisted of 20 dimensions, instead of the 25 dimensions used by G25, and the Quality Control was made with geno set to 20%.
After the Quality Control of PLINK and the PCA being made, there was also the removal of individuals that distanced themselves from the cluster they were supposed to be in (an example being a San sample that appeared as between the Europeans and the Africans, probably a consequence of colonialism).
With the scaling of the eigenvecs obtained by the PCA using the eigenvals, the data was then sent to Vahaduo for the estimation of admixture.
RESULTS
Although in a general way the results appeared consistent for the samples tested, they were not that accurate for both the continental residual percentages (up to 6%), often lacking certain well known residuals or attributing false residuals (southeast asian with SSA), and the intracontinental percentages (not counting only residuals), having the most apparent intracontinental problems in West Eurasia.
In conclusion, further testing is required to see if the methodology could be perfected or if it has inherent flaws, but in this case, it appeared to have problems pinpointing with a high accuracy like it is described as being by forum members.
REFERENCES
http://www-evo.stanford.edu/repository/paper0002/
https://www.cog-genomics.org/plink/1.9/
https://evolbio.ut.ee/
https://vahaduo.github.io/vahaduo/
gixajo
02-14-2022, 05:05 PM
We already know that it is not perfect, but, is there anything better than the G25 right now?
We already know that it is not perfect, but, is there anything better than the G25 right now?
There are various methods more efficient than the PCA-driven one, but for the moment it's very time consuming to do them and it also requires some bioinformatics knowledge.
FinalFlash
02-14-2022, 05:10 PM
We already know that it is not perfect, but, is there anything better than the G25 right now?
qpAdm?
qpAdm?
In all methods, statistical errors tended to decrease, but estimates from FA andqpAdmwere still more accurate than those computed with PC projections and STRUCTURE.
https://www.nature.com/articles/s41467-020-18335-6
qpAdm?
In all methods, statistical errors tended to decrease, but estimates from FA andqpAdmwere still more accurate than those computed with PC projections and STRUCTURE.
https://www.nature.com/articles/s41467-020-18335-6
FinalFlash
02-14-2022, 05:57 PM
In all methods, statistical errors tended to decrease, but estimates from FA andqpAdmwere still more accurate than those computed with PC projections and STRUCTURE.
https://www.nature.com/articles/s41467-020-18335-6
The problem is that qpAdm isn't as easily accessible or user-friendly as G25 for example.
The problem is that qpAdm isn't as easily accessible or user-friendly as G25 for example.
Yes, like I stated previously, there are methods more efficient but not friendly for the common folk.
Slavic Italian
02-14-2022, 06:05 PM
I can plug in so many ethnicities and get similar results. You can manipulate it easily. It's very average.
kingmob
02-14-2022, 06:20 PM
Your post is very informative and touches on things that I've suspected also but was unable to structure them into specifics due to lack of your technical knowledge from my part.
My suspicions were initially formed when I saw how different plots would form, depending on the references used, for example how the Euro cluster would change in relation to each European sub-cluster, depending whether Turkic or East Asian references were included or how one could learn how to manipulate the result by abusing midway points on the PCA, etc.
But what really ended up annoying me is how there's a big discussion nowadays that attempts to re-write historical and archeological evidence just because some "g25 model says so".
So, I banged my head against the wall for a while but I learned how to use qpAdm on a virtual machine, so now I do that when I can.
gixajo
02-14-2022, 06:26 PM
There are various methods more efficient than the PCA-driven one, but for the moment it's very time consuming to do them and it also requires some bioinformatics knowledge.
Well, if it is more complicated to achieve and more knowledge is needed to use it correctly, it is not better, but not more precise.
Can yo do it?
gixajo
02-14-2022, 06:39 PM
The problem is that qpAdm isn't as easily accessible or user-friendly as G25 for example.
I agree, thatīs what I meant when I say that G25 was "better".
Edit:I deleted the previous message.
gixajo
02-14-2022, 06:50 PM
And also... When you talk about using qaPdm are you talking about using it directly with raw data, or making some kind of coordinates with it?
gixajo
02-14-2022, 06:55 PM
I can plug in so many ethnicities and get similar results. You can manipulate it easily. It's very average.
That problem would be similar in every system you could use. Results depend on references with which you compare something.
But they are speaking about the "accuracy" of the system itself, on how to elaborate the coordinates, and the solvency of the algorithm that calculates the solutions with those coordinates.(If I understand well what they are speaking about).
gixajo
02-14-2022, 07:00 PM
In all methods, statistical errors tended to decrease, but estimates from FA andqpAdmwere still more accurate than those computed with PC projections and STRUCTURE.
]
Well, and after the criticism of the G25 and its inconsistencies, with which we all undoubtedly agree, (even probably the ones who created it), what do you propose to replace it?
Your post is very informative and touches on things that I've suspected also but was unable to structure them into specifics due to lack of your technical knowledge from my part.
My suspicions were initially formed when I saw how different plots would form, depending on the references used, for example how the Euro cluster would change in relation to each European sub-cluster, depending whether Turkic or East Asian references were included or how one could learn how to manipulate the result by abusing midway points on the PCA, etc.
But what really ended up annoying me is how there's a big discussion nowadays that attempts to re-write historical and archeological evidence just because some "g25 model says so".
So, I banged my head against the wall for a while but I learned how to use qpAdm on a virtual machine, so now I do that when I can.
Glad to help!
Well, if it is more complicated to achieve and more knowledge is needed to use it correctly, it is not better, but not more precise.
Can yo do it?
It is better and more precise, for the accuracy is greater.
Yes, I can do it.
Well, and after the criticism of the G25 and its inconsistencies, with which we all undoubtedly agree, (even probably the ones who created it), what do you propose to replace it?
Perhaps a calculator I'm currently developing?
Even G25 is nowhere near as user-friendly as Gedmatch. For starters G25 should become free. I don't know why the method cannot be somehow stolen from Davidski and leaked into the open. Fuck that monopolist!
Even G25 is nowhere near as user-friendly as Gedmatch. For starters G25 should become free. I don't know why the method cannot be somehow stolen from Davidski and leaked into the open. Fuck that monopolist!
I know how to make something like that, but the problem is that given the specific database used by Davidski, it would require trial and error to get consistent results.
gixajo
02-14-2022, 08:52 PM
Perhaps a calculator I'm currently developing?
Thatīs fantastic, I am looking forward to trying it.:thumb001:
So should we take the criticism of the G25 as the typical commercial strategy of discrediting or eroding the image of a long-established product in order to place our new product on the market?
Or is it really a sincere and disinterested criticism?
Thatīs fantastic, I am looking forward to trying it.:thumb001:
So should we take the criticism of the G25 as the typical commercial strategy of discrediting or eroding the image of a long-established product in order to place our new product on the market?
Or is it really a sincere and disinterested criticism?
The calculator I'm making is not only a personal project but it will also be free, so take it as a sincere and disinterested criticism.
gixajo
02-14-2022, 09:15 PM
The calculator I'm making is not only a personal project but it will also be free, so take it as a sincere and disinterested criticism.
I don't see any problem that if it's really worth it, you make money with it.
I'm sure you've spent a lot of time on the project, but I'm also sure you've enjoyed doing it.
I don't see any problem that if it's really worth it, you make money with it.
I'm sure you've spent a lot of time on the project, but I'm also sure you've enjoyed doing it.
Perhaps I can make money with a future project or donations, but this one will be free.
gixajo
02-14-2022, 09:38 PM
Perhaps I can make money with a future project or donations, but this one will be free.
Well, offering it for free is a great way to get volunteers to try it out, and thus correct possible initial failures, fix them and once the final product is ready in its final version, start asking for money.
Anyway, good luck, and I hope that the final product is excellent and meets the expectations you have in it.
Well, offering it for free is a great way to get volunteers to try it out, and thus correct possible initial failures, fix them and once the final product is ready in its final version, start asking for money.
Anyway, good luck, and I hope that the final product is excellent and meets the expectations you have in it.
Yeah, it is a good idea what you said, but one thing I see is missing for people is to have a free and reliable calculator they can count on, especially for the ones with not enough money to pay. The maximum I could do about asking money would be to give some minor benefits to the ones who give money, like perhaps be able to customize interface and have access to other methods of data visualization (heatmaps, pie charts, et cetera).
And thank you, I also wish it meets my expectations. I plan on starting the coding for it tomorrow, it will be developed in Python instead as I'm more well experienced with it than with R to be honest, and I also plan on perhaps making a website for showing the results because I really want it to be very user-friendly.
If you want to I can update you everytime I do something big.
kingmob
02-15-2022, 05:24 AM
Yeah, it is a good idea what you said, but one thing I see is missing for people is to have a free and reliable calculator they can count on, especially for the ones with not enough money to pay. The maximum I could do about asking money would be to give some minor benefits to the ones who give money, like perhaps be able to customize interface and have access to other methods of data visualization (heatmaps, pie charts, et cetera).
And thank you, I also wish it meets my expectations. I plan on starting the coding for it tomorrow, it will be developed in Python instead as I'm more well experienced with it than with R to be honest, and I also plan on perhaps making a website for showing the results because I really want it to be very user-friendly.
If you want to I can update you everytime I do something big.
I'd like to commend you for your effort and underline the importance of OPEN-SOURCE endeavors like ADMIXTURE and qpAdm and, of course, your own project. It's very important to be able to replicate the validity of the process for transparency and objectivity and not have it hidden a) behind a paywall, b) non-disclosed methodology and process of how the whole thing works, like g25 does.
I am looking forward to your work and I support your idea to have a donation page where people can chip in for your time and creativity.
Lucas
02-15-2022, 07:50 AM
First of all G25 is based on SmartPCA not Plink PCA.
Second there is thread on TA by Vbknethio where he tried build G30 so you are not first.
vbnetkhio
02-15-2022, 08:12 AM
First of all G25 is based on SmartPCA not Plink PCA.
Second there is thread on TA by Vbknethio where he tried build G30 so you are not first.
it's not rocket science :) it's just difficult to choose the reference samples, if you just throw the entire global dataset in, the differences between some African and Asian groups are overblown and European groups are closer to each other. as you remove the Africans it changes. but even if you do that you just get another G25 which already exists, so why bother.
I'd like to commend you for your effort and underline the importance of OPEN-SOURCE endeavors like ADMIXTURE and qpAdm and, of course, your own project. It's very important to be able to replicate the validity of the process for transparency and objectivity and not have it hidden a) behind a paywall, b) non-disclosed methodology and process of how the whole thing works, like g25 does.
I am looking forward to your work and I support your idea to have a donation page where people can chip in for your time and creativity.
Thank you again! As a scientist/biologist, unfortunately that lack of transparency is found in my area, with paid articles and hidden samples. I'm glad to be helping you guys atleast have something to work with and maybe also inspire you all to try the same.
First of all G25 is based on SmartPCA not Plink PCA.
Second there is thread on TA by Vbknethio where he tried build G30 so you are not first.
The methodology is similar, it's PCA-driven.
Also, my project will not use PCA, it has it's own methodology, so yes I am first on that.
vbnetkhio
02-15-2022, 01:08 PM
The PCA-driven Eurogenes G25 calculator has been one of the topics of great interest here in this forum when talking about autosomal tests and, because of that, there was the need of a research looking at the method used by G25 and replicating it for testing accuracy.
METHODOLOGY
Using PLINK, I made a PCA with the samples gathered from the Estonian Biocentre (for European, Asian and Amerindian samples) and Henn et al. (for African samples).
The PCA consisted of 20 dimensions, instead of the 25 dimensions used by G25, and the Quality Control was made with geno set to 20%.
After the Quality Control of PLINK and the PCA being made, there was also the removal of individuals that distanced themselves from the cluster they were supposed to be in (an example being a San sample that appeared as between the Europeans and the Africans, probably a consequence of colonialism).
With the scaling of the eigenvecs obtained by the PCA using the eigenvals, the data was then sent to Vahaduo for the estimation of admixture.
RESULTS
Although in a general way the results appeared consistent for the samples tested, they were not that accurate for both the continental residual percentages (up to 6%), often lacking certain well known residuals or attributing false residuals (southeast asian with SSA), and the intracontinental percentages (not counting only residuals), having the most apparent intracontinental problems in West Eurasia.
In conclusion, further testing is required to see if the methodology could be perfected or if it has inherent flaws, but in this case, it appeared to have problems pinpointing with a high accuracy like it is described as being by forum members.
REFERENCES
http://www-evo.stanford.edu/repository/paper0002/
https://www.cog-genomics.org/plink/1.9/
https://evolbio.ut.ee/
https://vahaduo.github.io/vahaduo/
did you do any SNP filtering? if you used more than 300k SNPs, maybe this caused the problems with intrercontinental ancestry you describe.
did you do any SNP filtering? if you used more than 300k SNPs, maybe this caused the problems with intrercontinental ancestry you describe.
Geno is the option for SNP filtering that I used, the total SNP count was less than 300k SNPs after the filtering (before it was more than a million).
vbnetkhio
02-15-2022, 01:23 PM
Geno is the option for SNP filtering that I used, the total SNP count was less than 300k SNPs after the filtering (before it was more than a million).
yeah sorry, I confused it with "mind".
but G25 doesn't have these problems with continental ancestry, so it's possible to make a PCA without such issues, with the right settings.
maybe the problem is cause by the sofwtare used (plink vs. smartpca)
yeah sorry, I confused it with "mind".
but G25 doesn't have these problems with continental ancestry, so it's possible to make a PCA without such issues, with the right settings.
maybe the problem is cause by the sofwtare used (plink vs. smartpca)
Perhaps... like I stated in the conclusions, it's still required to do further testing.
Powered by vBulletin® Version 4.2.3 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.