DEV Community

Luca Cozzuto
Luca Cozzuto

Posted on

Adding nf-core modules to your Nextflow pipeline

NF-Core is not only a collection of state-of-the-art pipelines. They also offer plenty of modules that can be easily included in your workflows. This is a very quick guide on how to do it.

  1. Install nf-core tools and pre-commit

pip install nf-core
pip install pre-commit

  1. Initialize pre-commit by writing a .pre-commit-config.yaml file.

This is an example:

repos:
  - repo: https://github.com/pre-commit/pre-commit-hooks
    rev: v4.5.0
    hooks:
      - id: trailing-whitespace
      - id: end-of-file-fixer
      - id: check-yaml
Enter fullscreen mode Exit fullscreen mode

Then do a pre-commit install

pre-commit install 

pre-commit installed at .git/hooks/pre-commit
Enter fullscreen mode Exit fullscreen mode

and pre-commit run

pre-commit run --all-files

trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...............................................................Passed
Enter fullscreen mode Exit fullscreen mode
  1. Add to your nextflow config a manifest section together with env and profiles.
manifest {
    name = 'GATK_WGS_preprocessing'
    author = 'Luca Cozzuto'
    description = 'A description of your pipeline'
    version = 2.0
}

env {
  R_PROFILE_USER = "/.Rprofile"
  R_ENVIRON_USER = "/.Renviron"
  PYTHONNOUSERSITE = 1
}

profiles {
    myprofile {
       includeConfig 'conf/myprofile.config'
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. Search for a module
nf-core modules list remote
Enter fullscreen mode Exit fullscreen mode
  1. Install a module
nf-core modules install fastqc
Enter fullscreen mode Exit fullscreen mode

Indicate that you are installing within a pipeline

nf-core modules install fastqc


                                          ,--./,-.
          ___     __   __   __   ___     /,-._.--~\ 
    |\ | |__  __ /  ` /  \ |__) |__         }  {
    | \| |       \__, \__/ |  \ |___     \`-._,-`-,
                                          `._,._,'

    nf-core/tools version 3.2.0 - https://nf-co.re


WARNING  'repository_type' not defined in .nf-core.yml                                                                                                       
? Is this repository a pipeline or a modules repository? Pipeline
INFO     To avoid this prompt in the future, add the 'repository_type' key to your .nf-core.yml file.                                                        
? Would you like me to add this config now? [y/n] (y): y
INFO     Config added to '.nf-core.yml'                                                                                                                      
INFO     The 'modules.json' file is not up to date. Recreating the 'modules.json' file.                                                                      
? Can't find a ./modules directory. Would you like me to create one? [y/n] (y): y
INFO     Creating ./modules directory in '.'                                                                                                                 
INFO     Installing 'fastqc'                                                                                                                                 
INFO     Use the following statement to include this module:                                                                                                 

 include { FASTQC } from '../modules/nf-core/fastqc/main'   
Enter fullscreen mode Exit fullscreen mode
  1. Let's read the docs of the module. Go to https://nf-co.re/modules/fastqc/

The input is groovy map with both meta information and files. So I made it in this way:

include { FASTQC } from "${projectDir}/modules/nf-core/fastqc"

if (params.single == "NO") {
    Channel
     .fromFilePairs( params.reads, checkIfExists: true )  // size: 2 is used by default                                           
     .map {[ [id: it[0], single_end:false],  it[1] ] }
     .set { reads }              
} else {
    Channel
     .fromFilePairs( params.reads, size: 1, checkIfExists: true)
     .map {[ [id: it[0], single_end:true],  it[1] ] }
     .set { reads }  
}

workflow {
    FASTQC(reads)   
}
Enter fullscreen mode Exit fullscreen mode
  1. In the main.nf code of fastqc, the label indicated is process_medium, so let's define it in our nextflow config file (myprofile.config):
process {
    withLabel: process_medium {
        cpus = 2
        memory='12G'
    }
}
Enter fullscreen mode Exit fullscreen mode
  1. Let's now add the tool version and connect it to multiqc
nf-core modules install multiqc 
Enter fullscreen mode Exit fullscreen mode

and add it to the include

include { MULTIQC       } from "${projectDir}/subworkflows/nf-core/fastqc"
Enter fullscreen mode Exit fullscreen mode

As we can see the output of fastqc module consists of 3 channels: html, zip and versions. The first two are groovy maps with meta information and files while the latter is just a file. We can plug the html and version to multiqc in this way:

workflow {    
    fqc = FASTQC(reads)
    ch_versions = fqc.versions
    multiqc_data = fqc.zip.map{ meta, zip -> return zip } )
    MULTIQC(multiqc_data.collect(), [], [], [], [], [])
}
Enter fullscreen mode Exit fullscreen mode

The last input of MULTIQC can be left as an empty map, so they are skipped.

Now we miss the version file for being uploaded to multiqc. We need to install utils_nfcore_pipeline for using the function softwareVersionsToYAML

nf-core subworkflows install utils_nfcore_pipeline
Enter fullscreen mode Exit fullscreen mode

let's include it

 include { softwareVersionsToYAML } from '${projectDir}/subworkflows/nf-core/utils_nfcore_pipeline/'                                                                 

Enter fullscreen mode Exit fullscreen mode

and then


workflow {    
    fqc = FASTQC(reads)
    ch_versions = fqc.versions
    multiqc_data = fqc.zip.map{ meta, zip -> return zip } )

    // STORE VERSIONS OF TOOLS
    softwareVersionsToYAML(ch_versions)
        .collectFile(
            storeDir: "${params.output}/pipeline_info",
            name: 'nf_core_'  + 'pipeline_software_' +  'mqc_'  + 'versions.yml',
            sort: true,
            newLine: true
         ).set { ch_collated_versions }

    multiqc_data = multiqc_data.mix(ch_collated_versions)

    MULTIQC(multiqc_data.collect(), [], [], [], [], [])
}


Enter fullscreen mode Exit fullscreen mode

Top comments (0)