This page contains all the necessary information needed to install and run Twin.
What is Twin?
Twin is a program that aligns in-silico digested contigs to an optical map.
What type of input is expected for Twin?
The Twin executable consumes two files:
- A binary file containing a sequence of 32 bit integers, each representing a fragment size in bp. This can be created from a text file in the following format using the included script om2bytes.py. Each line of the text file contains two decimal numbers: The size of the fragment and the standard deviation (both in kb), separated by white space. The standard deviation is ignored. (This is the same file format used by the match executable distributed with SOMA v2).
- A text file containing in-silico digested contigs. This file contains pairs of lines. The first line in each pair constains an identifier, this contig length in bp, and the number of restriction sites, separated by white space. The second line contains a white space delimited list of the restriction site positions. (This is also the format used by the match program distributed with SOMA v2). These files can be produced from an assembly in FASTA format using the included digest.py script. (Usage example for AflIII enzyme which cuts 1 bp after the beginning of the enzyme's recognition sequence: "digest.py contigs.fasta CTTAAG 1 > contigs.silico")
Obtaining and installing Twin
Twin can be obtained from the Download page.
Installation
Twin is known to require the following packages be installed:Notes which may help with installation on Cent OS 6 can be found here.
After installing these packages, modify the Makefile to point to your install location for boost and sdsl-lite. (It is probably easiest to use your Linux distributions version of Boost rather than building it yourself). To install Twin from the source package, unpack the tarball, change to the new directory, and build as follows:
tar zxvf twin-1.0.tar.gz
cd twin/
make
Using Twin
A tutorial is available to supplement the following documentation.
The TWIN executable reads in in-silico digested contigs from the same file format as SOMA v2. An included script, "om2bytes.py" must be used for converting SOMA formatted optical maps into a binary file, which is the format TWIN uses for input of the optical map. Redirect TWIN's standard output to a file (eg. "twin --silico_map insilico_contigs.txt --opt_map optical_map.bin > twin_out.txt"). The included script "twin2psl.py" will convert a file containing the standard output from TWIN into .psl format.
The following is a detailed description of the options used to control Twin:
--help produce help message --verbose show successful steps in approximate backtracking search --opt_map arg REQUIRED set optical map binary file --silico_map arg REQUIRED set in-silico digested contigs file --fval arg precision/recall tradeoff (default 4.0) --search_radius arg radius around silico fragment size that should be searched for optmap candidates (i.e. tollerance) (default 1000) --largest_maybe_frag arg size below which TWIN should consider discarding in-silico digested fragments (default 1000) --smallest_frag_length arg size below which in-silico digested fragments should be always discarded (default 250)
FAQ
Questions and answers will be posted here as they arise in the use of Twin.