qpAdm thread

**Zoro** · 02-24-2021, 10:57 PM

Originally Posted by Korialstrasz

I have got these results:

Code:

        left   weight    se     z
        
Adygei     0.439 0.595 0.738
Turkmen   0.348 0.304 1.14 
Bulgarian  0.213 0.404 0.527

Not bad for a first attempt, I guess. If you put closely related populations together, you will likely get negative estimations with very high standard errors.
I would like to reiterate that I have no prior knowledge on population genetics and quite ignorant compared to other apricians. For all I know, what I attempted might just be bullshit.

I would also like to add that, after extracting the f2 statistics, I get notified that

√ 1034771 SNPs read in total
! 1331 SNPs remain after filtering. 1331 are polymorphic.
i Allele frequency matrix for 1331 SNPs and 22 populations is 0 MB

I am not sure if this is normal but it seemed suspicious to me. As it eliminates almost the entireity of the SNPs. (I checked to see how many SNPS my FTDNA data and the Reich dataset has in common, and it turned out to be a little fewer than 130k. Both Reich and raw data have almost 600k lines for SNPs, significant amount of which I believe are either no-calls or missing values.)

First, congrats on getting the software running and absolutely no to you being more ignorant than other people here. In fact 98% of the people wouldn't even have a clue as to what you just wrote. I would say you're more knowledgeable than 98% of the people here !

Ok, so here's a few observations and tips:

1- "! 1331 SNPs remain after filtering. 1331 are polymorphic." This is absolutely not acceptable and will give you horrible results and in fact is mostly responsible for the 114% standard errors you got on Turkmen. Although its very important for Admixtools 2 not to have missing SNPs in any of your samples ( in other words maxmiss=0) it's just as important that you salvage at least 100,000 SNPs. Drop low coverage samples if you have to

2- Assuming you were able to get close to 100,000 SNPs if you still get high SE it means your left pops are too closely related and your right pops are unable to properly distinguish the difference between them. So add some right pops that are very differentially related to one left pop vs the other left pop.

3- Let me know if you need a simple script to convert your FTDNA or Ancestry data to 23andme format

You're on a good track, Good luck !

**Korialstrasz** · 02-25-2021, 08:08 PM

Originally Posted by Zoro

First, congrats on getting the software running and absolutely no to you being more ignorant than other people here. In fact 98% of the people wouldn't even have a clue as to what you just wrote. I would say you're more knowledgeable than 98% of the people here !

Ok, so here's a few observations and tips:

1- "! 1331 SNPs remain after filtering. 1331 are polymorphic." This is absolutely not acceptable and will give you horrible results and in fact is mostly responsible for the 114% standard errors you got on Turkmen. Although its very important for Admixtools 2 not to have missing SNPs in any of your samples ( in other words maxmiss=0) it's just as important that you salvage at least 100,000 SNPs. Drop low coverage samples if you have to

2- Assuming you were able to get close to 100,000 SNPs if you still get high SE it means your left pops are too closely related and your right pops are unable to properly distinguish the difference between them. So add some right pops that are very differentially related to one left pop vs the other left pop.

3- Let me know if you need a simple script to convert your FTDNA or Ancestry data to 23andme format

You're on a good track, Good luck !

Thanks! Your instructions have been tremendously helpful. I think I managed to convert the FTDNA file myself without losing any SNPS, but I don´t know if I missed anything.

I took the part below from the admixtools documentation and this is pretty much in line with what you advise.

By default, extract_f2() will be very cautious and exclude all SNPs which are missing in any population (maxmiss = 0). If you lose too many SNPs this way, you can either

*limit the number of populations for which to extract f2-statistics,
*compute f3- and f4-statistics directly from genotype files, or
*increase the maxmiss parameter (maxmiss = 1 means no SNPs will be excluded).
The advantages and disadvantages of the different approaches are described here. Briefly, when running qpadm() and qpdstat() it can be better to choose the safer but slower options 1 and 2, while for qpgraph(), which is not centered around hypothesis testing, it is usually fine choose option 3. Since the absolute difference in f-statistics between these approaches is usually small, it can also make sense to use option 3 for exploratory analyses, and confirm key results using options 1 or 2.

I tried different maxmiss values to salvage some SNPS but the models I ran afterwards did not make much sense. I need to try different sets of populations, it seems. I had the impression that right-hand side populations functions akin to a "control variable", so, would it then make sense to run an analysis on modern populations using, let´s say, Iron Age samples that provide enough "control" for the left. Or is it better not to take too many liberties in this regard?

I´ll be reading the instructions here: https://www.biorxiv.org/content/bior...ed/media-1.pdf

**Zoro** · 02-26-2021, 01:49 AM

Originally Posted by Korialstrasz

Thanks! Your instructions have been tremendously helpful. I think I managed to convert the FTDNA file myself without losing any SNPS, but I don´t know if I missed anything.

I took the part below from the admixtools documentation and this is pretty much in line with what you advise.

I tried different maxmiss values to salvage some SNPS but the models I ran afterwards did not make much sense. I need to try different sets of populations, it seems. I had the impression that right-hand side populations functions akin to a "control variable", so, would it then make sense to run an analysis on modern populations using, let´s say, Iron Age samples that provide enough "control" for the left. Or is it better not to take too many liberties in this regard?

I´ll be reading the instructions here: https://www.biorxiv.org/content/bior...ed/media-1.pdf

Post a couple of runs here showing me all the details of the output such as no of snps, right and left pops and I’ll try to diagnose for you. I would use maxmiss=0.002 or 0.003

**vbnetkhio** · 02-26-2021, 02:37 PM

some might find this useful:

I made an AncestryDNA raw data to .ped converter script for R:
Attachment 106242

to run it, rename your raw data to "data.txt", then place the "data.txt" and "anc_to_ped.r" into your R directory, and run this command in R: source("anc_to_ped.r")

The file has to be in the AncestryDNA format.
If you have a different format, e.g 23andme, you can convert it first with DNA kit Studio (don't use a raw data template, just choose the ancestryDNA format)

**Kaspias** · 02-26-2021, 06:14 PM

So I felt a need to learn how to run qpAdm, but pretty much beginner in these tools.

I get this error while trying to create the 3rd file in plink:

Code:

1426149 (of 1426149) markers to be included from [ data.map ]

ERROR:
A problem with line 1 in [ data.ped ]
Expecting 6 + 2 * 1426149 = 2852304 columns, but found 2842162

**vbnetkhio** · 02-26-2021, 06:23 PM

Originally Posted by Kaspias

So I felt a need to learn how to run qpAdm, but pretty much beginner in these tools.

I get this error while trying to create the 3rd file in plink:

Code:

1426149 (of 1426149) markers to be included from [ data.map ]

ERROR:
A problem with line 1 in [ data.ped ]
Expecting 6 + 2 * 1426149 = 2852304 columns, but found 2842162

did you use my script?
there's a bug, of course... i'll try to fix it.

edit:
all seems to work fine for me, what are you trying to do with the file?

**Kaspias** · 02-26-2021, 08:05 PM

Originally Posted by vbnetkhio

did you use my script?
there's a bug, of course... i'll try to fix it.

edit:
all seems to work fine for me, what are you trying to do with the file?

I was following Korialstrasz's entries in #68. Here what I have done:

I have got .bed and .bam, but while using this command: plink --file yourfile --make-bed --out yourfile_new to plink(bim fam fim) I received the error I posted. I used the R script you posted in order to get bed and bam.

Besides, while extracting the populations from eigenstrat file I could not manage to get multiple populations within the file but only a pop, like: eigenstrat_to_plink("v44.3_HO_public",outpref = "master_plink",pops = 316)

I think I will have some more problems in the following steps as I'm clueless, but that's it for now

**vbnetkhio** · 02-26-2021, 08:11 PM

Originally Posted by Kaspias

I was following Korialstrasz's entries in #68. Here what I have done:

I have got .bed and .bam, but while using this command: plink --file yourfile --make-bed --out yourfile_new to plink(bim fam fim) I received the error I posted. I used the R script you posted in order to get bed and bam.

Besides, while extracting the populations from eigenstrat file I could not manage to get multiple populations within the file but only a pop, like: eigenstrat_to_plink("v44.3_HO_public",outpref = "master_plink",pops = 316)

I think I will have some more problems in the following steps as I'm clueless, but that's it for now

did you convert your raw data to ancestry format first? (with allele1 and allele2 in separate columns?)
that seems to be causing your error.

**vbnetkhio** · 02-26-2021, 08:12 PM

...

**Kaspias** · 02-26-2021, 08:16 PM

Originally Posted by vbnetkhio

did you convert your raw data to ancestry format first?

I have done. However, I used a super kit(created with 3 different raw data) and had ~40MB size while an average raw data has 15-20, stating in case if it might be about it.

Thread: qpAdm thread

Thread Tools

Thread Information

Users Browsing this Thread

Similar Threads

[qpAdm] Someone know how to use it?

qpAdm modelling, first attempt

Bookmarks

Bookmarks

Posting Permissions