DEV Community

Adriano De Marino, PhD
Adriano De Marino, PhD

Posted on

BCFtools

Since I have started to work in bioinformatics, I faced a lot of issues with computational performances and memory usage for the execution of some tasks. In bioinformatics, generally, people work with large amount of data and optimisation is often required.

In case of genomics data one of the most used format is the Variant Call Format (VCF) 

After year of working, with this type of file, one of the best software for manipulating and modification of this format is bcftools. You can do basically everything with this CLI software on a VCF file.
 
The main tips I recommend are:

  • Always convert your file from VCF to BCF format, this will increase at least x4 the execution time speed on all tasks. 
  • The flag —-threads will not speed up the process, The multiprocessing is only applied for compression and decompression.
  • When you have to execute multiple commands on your VCF file, such as: Extraction, Annotation and Normalisation, ensure that you are using bcftools with | (pipe) and -Ou in order to avoid to overwrite intermediate files that will slow down your process.

An example:
bcftools view -t chr2 input.vcf.gz -Ou |
bcftools annotate —-rename-chrs alias.txt -Ou |
bcftools norm -m - -Oz -o output.vcf.gz —-threads 8


This command:

  1. extract only chromosome 2 variants -t chr2
  2. Change chromosome prefix —rename-chrs
  3. Split multi-allelic sites into bi-allelic records and save the output in VCF compressed format -m - and -Oz.

Top comments (0)