Online service for fastq and BAM conversion

**vbnetkhio** · 09-10-2021, 07:41 PM

Originally Posted by Tomenable

Are there, currently, any interesting samples in FASTQ worth converting ???

These 3 Hungarian studies are available on ENA and not yet converted:
Maternal lineages from 10-11th century commoner cemeteries of the Carpathian Basin

Mitogenomic data indicate admixture components of Central-Inner Asian and Srubnaya origin in the conquering Hungarians

Early medieval genetic data from Ural region evaluated in the light of archaeological evidence of ancient Hungarians

Polish data
https://ftp.cngb.org/pub/gigadb/pub/...0310/Raw_data/

**Lucas** · 09-11-2021, 05:45 PM

30 GB bam was produced for about day. I guess using fastqbam on my computer it would takes a week

**Lucas** · 09-12-2021, 07:32 AM

Originally Posted by Lucas

30 GB bam was produced for about day. I guess using fastqbam on my computer it would takes a week

OMG I attained limit of 250 GB on acount.

**vbnetkhio** · 09-12-2021, 07:51 AM

Originally Posted by Lucas

OMG I attained limit of 250 GB on acount.

After which step?
Did you extract commercial company snps in the second step? The file should be much smaller then.

Also you should delete datasets from the previous steps you don't need anymore, but it's a bit tricky, you need to use "delete hidden datasets" and "purge deleted datasets" in the history settings after deleting them, or if you have more than 1 history there is also something like "purge deleted histories"

**Lemminkäinen** · 09-12-2021, 09:05 AM

I use BWA to align reads.

**Lemminkäinen** · 09-12-2021, 09:12 AM

Originally Posted by Tomenable

Are there, currently, any interesting samples in FASTQ worth converting ???

Some studies release only Fastq-files in ENA or in other archives.. You can search by study project names from ENA.

**Lucas** · 09-12-2021, 09:15 AM

Originally Posted by vbnetkhio

After which step?
Did you extract commercial company snps in the second step? The file should be much smaller then.

Also you should delete datasets from the previous steps you don't need anymore, but it's a bit tricky, you need to use "delete hidden datasets" and "purge deleted datasets" in the history settings after deleting them, or if you have more than 1 history there is also something like "purge deleted histories"

What you said before was enough and I found it by myself that must use "purge.."

It was fault of very big fastq, which produced 30GB bam. Then vcf was few times bigger. And I did it two times simultaneously. So I reached maximum of capacity. OK now I'm downbloading those bams and will convert them in WGS as usual. But probably only those older version would work for them.

**smd555** · 09-19-2021, 05:58 PM

Originally Posted by vbnetkhio

first upload your file, then in bcftools mpileup select this file under "input BAM/CRAM", and select hg19 under "Select reference genome", and then you can execute the algorythm. When it finishes it will output a BCF file and it will be added to your history.

then after that run "bcftools call" on that BCF file output from the first step, just select the file and run it, and you'll get another BCF file. in this step you can already choose to output a VCF file but it will probably be too big and take too long to download and convert.

because of that I also run "bcftools filter" on the second BCF file. select the second bcf file, then upload this file : https://easyupload.io/769eny , then under Restrict To > Regions select "Operate on Regions specified in a history dataset", and select this dataset you uploaded (new.tsv), and for your output type select "uncompressed VCF".

then you can download this VCF file, and convert it to 23andme with DNAKitStudio.

for this file, I started with the fastq file, not BAM. in this case there is one extra step at the begining. convert the fastq to BAM with "map with bwa-mem", you also need to select hg19 reference genome, "single" under "Single or Paired-end reads", and then select your fastq file and run it. then you'll have a BAM and you can do the mpileup and the rest.

1. With the help of "bcftools call" from the BCF file I produce the second BCF file (more compressed). But when I run "bcftools filter" and try to produce VCF - the error occurs:

[E::bcf_sr_regions_init] Could not parse the file /galaxy-repl/main/files/061/373/dataset_61373762.dat, using the columns 1,2[,-1]
Failed to read the regions: /galaxy-repl/main/files/061/373/dataset_61373762.dat

This occurs also if I try to run the first BCF (61 - original BCF).

2. Also "bcftools call" does not make uncompressed VCF from the second BCF(more compressed), only from the first BCF. The error:

Note: none of --samples-file, --ploidy or --ploidy-file given, assuming all sites are diploid
Wrong number of PL fields? nals=1 npl=-3

**vbnetkhio** · 09-19-2021, 10:39 PM

Originally Posted by smd555

1. With the help of "bcftools call" from the BCF file I produce the second BCF file (more compressed). But when I run "bcftools filter" and try to produce VCF - the error occurs:

This occurs also if I try to run the first BCF (61 - original BCF).

2. Also "bcftools call" does not make uncompressed VCF from the second BCF(more compressed), only from the first BCF. The error:

There is probably something wrong with your .dat file, could you post the first few lines from it?

Btw you can skip filtering and just output an uncompressed vcf from bcftools call but it will probably be very big.

**Lucas** · 09-19-2021, 10:51 PM

Originally Posted by vbnetkhio

There is probably something wrong with your .dat file, could you post the first few lines from it?

Btw you can skip filtering and just output an uncompressed vcf from bcftools call but it will probably be very big.

It is also possible to download vcf and convert using plink. But maybe this kind of vcf needs additional processing before?