DEV Community

Adriano De Marino, PhD
Adriano De Marino, PhD

Posted on

BCFtools

Since I have started to work in bioinformatics, I faced a lot of issues with computational performances and memory usage for the execution of some tasks. In bioinformatics, generally, people work with large amount of data and optimisation is often required.

In case of genomics data one of the most used format is the Variant Call Format (VCF) 

After year of working, with this type of file, one of the best software for manipulating and modification of this format is bcftools. You can do basically everything with this CLI software on a VCF file.
 
The main tips I recommend are:

  • Always convert your file from VCF to BCF format, this will increase at least x4 the execution time speed on all tasks. 
  • The flag —-threads will not speed up the process, The multiprocessing is only applied for compression and decompression.
  • When you have to execute multiple commands on your VCF file, such as: Extraction, Annotation and Normalisation, ensure that you are using bcftools with | (pipe) and -Ou in order to avoid to overwrite intermediate files that will slow down your process.

An example:
bcftools view -t chr2 input.vcf.gz -Ou |
bcftools annotate —-rename-chrs alias.txt -Ou |
bcftools norm -m - -Oz -o output.vcf.gz —-threads 8


This command:

  1. extract only chromosome 2 variants -t chr2
  2. Change chromosome prefix —rename-chrs
  3. Split multi-allelic sites into bi-allelic records and save the output in VCF compressed format -m - and -Oz.

Heroku

This site is built on Heroku

Join the ranks of developers at Salesforce, Airbase, DEV, and more who deploy their mission critical applications on Heroku. Sign up today and launch your first app!

Get Started

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

Best practices for optimal infrastructure performance with Magento

Running a Magento store? Struggling with performance bottlenecks? Join us and get actionable insights and real-world strategies to keep your store fast and reliable.

Tune in to the full event

DEV is partnering to bring live events to the community. Join us or dismiss this billboard if you're not interested. ❤️