Create a Mock Community for LMAS
Read Data
For the reaction of simulated sequencing data, we recommend using InSilicoSeq, a simulator that produces realistic Illumina reads primarily intended for simulating metagenomic samples, but that can also be used to produce sequencing data from a single genome.
It provides models for Illumina HiSeq, NovaSeq and Miseq to realistically estimate the read quality of real sequencing data.
Installation
InSilicoSeq can be installed through conda or pip. It requires python >= 3.5.
conda install -c bioconda insilicoseq
pip install insilicoseq
Alternatively, a docker container is available.
Usage
To generate mock communities, the following command can be used:
iss generate --genomes genomes.fasta --model hiseq --output issreads
where genomes.fasta should be replaced by a (multi-)fasta file containing the reference genome(s)
from which the simulated reads will be generated.
InSilicoSeq comes with 3 error models that can be passed with the --model parameter: MiSeq, HiSeq and NovaSeq
You can change the number of CPUs with the --cpus parameters, and the total number of reads to generate with the
-n parameter.
Alternatively, a model can be created based on existing sequencing data after alignment with the reference sequences.
iss model -b ref.bam -o my_model
And be used to generate the read data with the --model parameter:
iss generate --genomes genomes.fasta --model my_model.npz --output issreads
The mock reads will be saved with the issreads prefix.