Installation
Container engine
All components of LMAS are executed in docker containers, which means that you’ll need to have a container engine installed. The container engines available are the ones supported by Nextflow.
All components of LMAS are executed in docker containers, which means that you’ll need to have a container engine installed. The container engines available are the ones supported by Nextflow:
Shifter (undocumented)
If you already have any one of these installed, you are good to go as the provided docker containers are compatible with all engines available. If not, you’ll need to install one.
Singularity
Singularity is available to download and install here. Make sure that you have singularity v2.5.x or higher. Note that singularity should be installed as root and available on the machine(s) that will be running the nextflow processes.
Important
Singularity is available as a bioconda package. However, conda installs singularity in user space without root privileges, which may prevent singularity images from being correctly downloaded. Therefore it is not recommended that you install singularity via bioconda.
Docker
Docker can be installed following the instructions on the website: https://www.docker.com/community-edition#/download. To run docker as a non-root user, you’ll need to follow the instructions on the website: https://docs.docker.com/install/linux/linux-postinstall/#manage-docker-as-a-non-root-user
Installation through conda
LMAS can be easily installed through `Conda<https://conda.io/en/latest/>`_, an open source package management system and environment management system that runs on Windows, macOS and Linux. After its installation, LMAS is available on `Bioconda<https://anaconda.org/bioconda/LMAS>`_ and can be easily installed with:
conda install -c bioconda lmas
Manual installation
LMAS can be installed through Github (https://github.com/cimendes/LMAS). It requires a Nextflow installation (version ≥ 21.04.1) and can be used on any POSIX compatible system (Linux, OS X, etc). All components of LMAS are executed in Docker containers, being a container engine required.
Nextflow allows integration with multiple alternatives, such as Shifter or Singularity, so a particular one isn’t required.
To ensure the robustness of LMAS workflow and the custom python code for the quality assessment of assemblies, continuous integration of both the main workflow and the python templates is performed with GitHub Actions and pytest.
Below it’s a step by step guide on how to install LMAS and all its dependencies.
Nextflow (version 20.01.0 or higher) can be used on any POSIX compatible system (Linux, OS X, etc). It requires BASH and Java 8 (or higher) to be installed.
Important
Instructions on how to install Nextflow are available here
You can clone this repository with git.
git clone https://github.com/cimendes/LMAS.git
All files will be on your local machine.
To run LMAS you can simply call it with:
./LMAS <options>
If no option or –help is provided, LMAS will display its help message. Otherwise, the –fastq and –reference options are mandatory. By default they are set to ‘data/fastq/*_{1,2}.*’ and ‘data/reference/*.fasta’ respectively.
The main execution file for Nextflow is main.nf. Alternatively you can call LMAS directly with Nextflow:
nextflow run main.nf <options>
To use LMAS the following options are available:
_ __ __ _ ___
/\︵︵/\ | | | \/ | /_\ / __|
(◕('人')◕) | |__| |\/| |/ _ \\__ \
|︶| |____|_| |_/_/ \_\___/
Last Metagenomic Assembler Standing
Input parameters:
--fastq Path expression to paired-end fastq files.
(default: data/fastq/*_{1,2}.*)
--reference Path to the genome reference fasta file.
(default: data/reference/*.fasta)
--md Path to markdown with input sample description for report (optional).
(default: data/*.md)
Mapping and filtering paramenters:
--minLength Value for minimum contig length, in basepairs.
(default: 1000)
--mapped_reads_threshold Value for the minimum percentage of a read aligning to the
contig to be considered as mapped.
(default: 0.75)
Assembly quality assessment parameters:
--n_target Target value for the N, NA and NG metrics, ranging from 0 to 1.
(default: 0.5)
--l_target Target value for the L metric, ranging from 0 to 1.
(default: 0.5)
--plot_scale Scale of x-axis for the L, NA and NG metrics plots.
Allowed values: 'linear' or 'log'.
(default: log)
Assembly execution parameters:
--abyss Boolean controling the execution of the ABySS assembler.
(default: true)
--abyssKmerSize K-mer size for the ABySS assembler, as an intiger.
(default 96)
--abyssBloomSize Bloom filter size for the ABySS assembler.
It must be a sting with a value and an unit.
(default: 2G)
--gatb_minia Boolean controling the execution of the GATB Minia Pipeline assembler.
(default: true)
--gatbKmerSize K-mer sizes for the GATB Minia Pipeline assembler.
It must be a sting with the values separated with a comma.
(default 21,61,101,141,181)
--gatb_besst_iter Number of iteration during Besst scaffolding for the
GATB Minia Pipeline assembler.
(default 10000)
--gatb_error_correction Boolean to control weather to skip error correction for the
GATB Minia Pipeline assembler.
(default false)
--idba Boolean controling the execution of the IDBA-UD assembler.
(default true)
--metahipmer2 Boolean controling the execution of the MetaHipMer2 assembler.
(default true)
--metahipmer2KmerSize K-mer sizes for the MetaHipMer2 assembler.
It must be a sting with the values separated with a comma.
(default 21,33,55,77,99)
--minia Boolean controling the execution of the minia assembler.
(default: true)
--miniaKmerSize K-mer size for the minia assembler, as an intiger.
(default 31)
--megahit Boolean controling the execution of the MEGAHIT assembler.
(default true)
--megahitKmerSize K-mer sizes for the MEGAHIT assembler.
It must be a sting with the values separated with a comma.
(default 21,29,39,59,79,99,119,141)
--metaspades Boolean controling the execution of the metaSPAdes assembler.
(default true)
--metaspadesKmerSize K-mer sizes for the metaSPAdes assembler.
It must be a sting with 'auto' or the values separated with a space.
(default auto)
--spades Boolean controling the execution of the SPAdes assembler.
(default true)
--spadesKmerSize K-mer sizes for the SPAdes assembler.
It must be a sting with 'auto' or the values separated with a space.
(default auto)
--skesa Boolean controling the execution of the SKESA assembler.
(default true)
--unicycler Boolean controling the execution of the Unicycler assembler.
(default true)
--velvetoptimiser Boolean controling the execution of the VelvetOptimiser assembler.
(default: true)
--velvetoptimiser_hashs Starting K-mer size for the VelvetOptimiser assembler, as an intiger.
(default 19)
--velvetoptimiser_hashe End K-mer size for the VelvetOptimiser assembler, as an intiger.
(default 31)
Execution resources parameters:
--cpus Number of CPUs for the assembly and mapping processes, as an intiger.
This resource is double for each retry until max_cpus is reached.
(default 8)
--memory Memory for the assembly and mapping processes, in the format of
'value'.'unit'.
This resource is double for each retry until max_memory is reached.
(default 32 GB)
--time Time limit for the assembly and mapping processes, in the format of
'value'.'unit'.
This resource is double for each retry until max_time is reached.
(default 1d)
--max_cpus Maximum number of CPUs for the assembly and mapping processes,
as an intiger. It overwrites the --cpu parameter.
(default 32)
--max_memory Maximum memory for the assembly and mapping processes, in the format of
'value'.'unit'. It overwrites the --memory parameter.
(default 100 GB)
--max_time Maximum time for the assembly and mapping processes, in the format of
'value'.'unit'. It overwrites the --time parameter.
(default 3d)