Basic Usage
To use LMAS, the reference sequences must be passed with the --reference parameter, and --fastq
receives the short-read paired-end raw data for assembly.
The optional parameter --md allows the user to pass information on input samples, in a markdown file, to be
presented in the LMAS report.
All complete genomes (reference linear replicons) should be provided in a single file. Due to the ambiguous starting position of a circular replicon, an assembled contig will typically not align to the reference in a single unbroken alignment. Therefore, the linearized replicons are concatenated three times by LMAS to ensure that contigs can fully align even with start-end overlap and regardless of their starting position relative to that of the reference.
The raw data is a collection of sequence fragments from the references and can be either obtained in silico or from real sequencing platforms.
Warning
By default, LMAS expects the input data in a data/ folder, with the reference sequences in data/reference/*.fasta, the read data in data/fastq/*_{1,2}.*, and the markdown file in data/*.md.
When you clone it, LMAS has the following folder structure:
LMAS
├── bin/
├── containers.config
├── docker/
├── docs/
├── get_data.sh
├── lib/
├── LICENSE
├── LMAS.nf
├── nextflow.config
├── params.config
├── profiles.config
├── README.md
├── resources/
├── resources.config
└── templates/
The
LMAS.nfis the main execution file for LMAS.The
get_data.shbash script file downloads the ZymoBIOMICS Microbial Community Standard data.The
containers.config,nextflow.config,params.config,profiles.configandresources.configare LMAS configuration files.The
bin/andtemplates/folders contain custom LMAS code for data processing.The
docs/folder contains LMAS documentation source files.The
docker/folder contains the dockerfile for LMAS’ base container.The
resources/folder contains the LMAS report compiled code.
Customizing LMAS workflow configuration
Users can customize the workflow execution either by using command-line options, with --<name of parameter> <option>
or by modifying a simple plain-text configuration file, where parameters are set as key-value pairs.
There are four configuration files in LMAS:
nextflow.config
This is Nextflow main configuration file. It should not be edited.
params.config
The params.config file includes all available parameters for LMAS and their respective default values.
containers.config
The containers.config file includes the container directive for each process in LMAS.
These containers are retrieved from dockerhub if they do not exist locally yet.
Warning
You can change the container string to any other value, but it should point to an image that exists on dockerhub or locally.
profiles.config
The profiles.config file includes a set of pre-made profiles with all possible combinations of executors and container engines.
You can add new ones or modify an existing one.
resources.config
The resources.config file includes the CPUs and memory directives provided for each assembler in LMAS.
Warning
The memory directive increments automatically when a task is retried. If the directive is set to {16.Gb*task.attempt}, the memory used will be 16 Gb multiplied by the number of attempts.
ZymoBIOMICS Microbial Community Standard Data
As a proof-of-concept, the eight bacterial genomes and four plasmids of the ZymoBIOMICS Microbial Community Standards were used as reference. Raw sequence data of the mock communities, with an even and logarithmic distribution of species both from a real sequencing run and a simulated dataset, with and without error, matching the real data distribution of species, were used as input for LMAS.
The reference sequences and the mock sample are available at zenodo: https://doi.org/10.5281/zenodo.4588969
The even and log distributed raw sequence data is available at https://www.ebi.ac.uk/ena/browser/view/ERR2984773 and https://www.ebi.ac.uk/ena/browser/view/ERR2935805, respectively.
A script to download and structure the ZymoBIOMICS data to be used as default input for LMAS is provided, included in LMAS’ repository. To run it, simply execute:
sh get_data.sh
The files will be saved in the following structure:
data/
├── about.md
├── fastq
│ ├── ERR2935805_1.fq.gz
│ ├── ERR2935805_2.fq.gz
│ ├── ERR2984773_1.fq.gz
│ ├── ERR2984773_2.fq.gz
│ ├── EMS_1.fq.gz
│ ├── EMS_2.fq.gz
│ ├── ENN_1.fq.gz
│ ├── ENN_2.fq.gz
│ ├── LHS_1.fq.gz
│ ├── LHS_2.fq.gz
│ ├── LNN_1.fq.gz
│ └── LNN_2.fq.gz
└── reference
└── ZymoBIOMICS_genomes.fasta?download=1
This is already the expected input for LMAS. To execute LMAS you simply need to call the LMAS.nf execution file with Nextflow.
nextflow run LMAS.nf