A crucial problem in genome assembly is the discovery and correction of misassembly
errors in draft genomes. A large number of these errors can occur in draft genomes and
add to the cost and time associated with many scientific initiatives, including Genome 10K,
the iK5 project, and 1001 Genomes. We develop a method that will enhance the quality of
draft genomes by identifying and removing misassembly errors using paired short read
sequence data and optical mapping data.
We apply our method to various assemblies of the loblolly pine and Francisella tularensis
genomes. Our results demonstrate that we detect more than 54% of extensively misassembled
contigs and more than 60% of locally misassembed contigs in an assembly of Francisella
tularensis, and between 31% and 100% of extensively misassembled contigs and between
57% and 73% of locally misassembed contigs in the assemblies of loblolly pine.
Misassembly Detection using Paired-End Sequence Reads and Optical Mapping Data
by Martin D. Muggli, Simon J. Puglisi, Roy Ronen, and Christina Boucher. In submission.
python3 misSEQuel/missequel.py --outdir misSEQuel_out --contigs contigs.fasta --opt_map ecoli_XhoI_om --opt_map ecoli_Swai_om --enzyme XhoI --enzyme SwaI --is_prokaryote --reads1 mc.orig.1.fq --reads2 mc.orig.2.fq
Optical map files (in SOMA 'match' format) and their corresponding enzyme names are assumed to be in respective order (i.e. The first optical map file corresponds to the first enzyme name, etc.)
The names of the options can be accessed via the --help option as follows:
$ python3 misSEQuel/missequel.py --help Usage: missequel.py [options] Options: -h, --help show this help message and exit --reads1=R1 --reads2=R2 --contigs=CONTIGS --opt_map=OPT_MAP --outdir=OUTDIR --enzyme=ENZYMES --is_prokaryote --verbose
misSEQuel is freely available software for academic use. For nonacademic use, please contact the authors.
Send your questions or comments to sequel [dot] help [at] gmail [dot] com.