Add Assembler Process

New assemblers can be added with minimal changes to the pipeline, so that LMAS can be expanded as novel algorithms are developed. It’s implementation in DSL2 greatly facilitates this process.

The assemblers implemented are available in the assembly module located in the modules folder. The assembly.nf is the nextflow file that contains all the assembly processes.

The current available assemblers are:

Detailed information is available in the Short-Read (Meta)Genomic Assemblers page.

Warning

To add an assembler, it must be ensured that short-read paired-end sequence data can be provided as input.

Changing assembler version

The easiest way to change a version of a particular assembler in LMAS is by changing the containers for the assembler process. This is done through altering the container property in the containers.config file.

For example, for the SPADES process, the container “cimendes/spades:3.15.0-1” can be altered to another one that implements a different version of the tool:

withName: SPADES {
            container = "cimendes/spades:3.15.0-1"
        }
withName: SPADES {
            container = "cimendes/spades:3.14.1-1"
        }

Warning

You must ensure that the assembler executable is available in the $PATH and that ps is installed in the container for it to work with LMAS or any other Nextflow workflow.

Adding a new assembler

Create an issue with an assembler suggestion

An issue template is available to collect the necessary information for an assembler to be added to LMAS. Some information is required:

  • Container for the execution of the assembler, containing the executable in the PATH and Nextflow’s ps dependency;

  • Command to capture the assembler version, if available;

  • Minimal command to execute the assembler with short-read paired-end sequencing datasets;

  • Parameters (such as k-mer lists) to be passed onto the assembler.

By default, all assemblies are run with 8 CPUs and 32GB of memory.

Add process to assembly.nf manually

To add a new assembler to LMAS, a few steps must be completed. All alterations needed will be perfomed in the assembly.nf file, the params.config file and the containers.config.

  1. Add the needed parameters

In the the params.config file, add a new key-value pair for any parameter necessary to run the assembler, such as the list of k-mer values to use. The fastq input data is passed through the main –fastq parameter so it should not be included.

Warning

All assemblers in LMAS are toggleble through a –<assembler name> parameter, and this should be included in this file.

  1. Add a new process with the assembler

In the assembly.nf file, you need to add the process to execute the new assembler in the section marked with \PROCESSES.

To create the new process, you can use the following template, substituting NEW_ASSEMBLER with the new assembler name:

process NEW_ASSEMBLER {
    tag { sample_id }
    label 'process_assembly'
    publishDir 'results/assembly/NEW_ASSEMBLER/'

    input:
    set sample_id, path(fastq)
    val kmers from IN_NEW_ASSEMBLER_kmers

    when:
    pararm.NEW_ASSEMBLER

    output:
    set sample_id, val("NEW_ASSEMBLER"), file('*.fasta'), emit: assembly
    file(".*version"), emit: version

    script:
    """
    // capture assembler version and save into
    <version command> > .${sample_id}_NEWASSEMBLER_version

    // Run assembly in a try-except
    {
        <assembly command>
        echo pass > .status
    } || {
        echo fail > .status
    }
    """
}

Warning

You can access each of the fastq files with ${fastq_pair[1]} and ${fastq_pair[2]}.

You can access this values in the .nf file with params.<parameter>. For example:

IN_NEW_ASSEMBLER_kmers = Channel.value(params.newassemblerKmers)

Warning

Parameters need to be passed into a process through a channel.

This should be added inside the assembly_wf worflow in the end of the file.

Additionally, The new process needs to be added in the main: section of the workflow.

  1. Add assembly to main assembly collection

The channel with the version information must be merged into the main assembly collection channel, emitted by the assembly_wf workflow.

It should look like:

all_assemblies = ABYSS.out.assembly | mix(GATBMINIAPIPELINE.out.assembly,
                                          IDBA.out.assembly,
                                          MEGAHIT.out.assembly,
                                          METAHIPMER2.out.assembly,
                                          METASPADES.out.assembly,
                                          MINIA.out.assembly,
                                          NEW_ASSEMBLER.out.version, // new channel added
                                          SKESA.out.assembly,
                                          SPADES.out.assembly,
                                          UNICYCLER.out.assembly,
                                          VELVETOPTIMISER.out.assembly)

Warning

To facilitate reading, please respect the alphabetical order.

  1. Add version to main version collection

The channel with the version information must be merged into the main version collection channel, emitted by the assembly_wf workflow.

It should look like:

all_versions = ABYSS.out.version | mix(GATBMINIAPIPELINE.out.version,
                                       IDBA.out.version,
                                       MEGAHIT.out.version,
                                       METAHIPMER2.out.version,
                                       METASPADES.out.version,
                                       MINIA.out.version,
                                       NEW_ASSEMBLER.out.version,  // new channel added
                                       SKESA.out.version,
                                       SPADES.out.version,
                                       UNICYCLER.out.version,
                                       VELVETOPTIMISER.out.version) | collect

Warning

To facilitate reading, please respect the alphabetical order.

  1. Add the container for the new assembler

The container for the new assembler need to be added to the container.config file in the conf/ directory.

It should look like:

withName: NEW_ASSEMBLER {
    container = "<repository>/NEW_ASSEMBLER:<tag>"
}