The Merge module
The Merge module implemented an edited overlap-Layout-consensus algorithm (Li et al. 2012) to merge contigs according to the overlaps between them. The procedures are as follows:
  1. Make a copy. Since the input contigs could be in sense or antisense strands, add a reverse-complemented copy of the original input to ensure that at least one copy of contigs are all in sense strands.
  2. Self-to-self alignment. Call blastn of BLAST 2.9.0 to do self-to-self alignment with default options.
  3. Analyze the BLAST output. Only the forward overlapped match is kept, that is, the query and subject sequence are both in the sense strand, and the match is between the beginning of the downstream sequence and the ending of the upstream sequence. In addition, nested overlap is omitted to remove short redundant contigs.
  4. Generate a unidirectional graph. According to the overlap information above, a unidirectional graph is generated. The nodes of the graph represent the contigs and the directed edges represent overlaps between contigs. Since there are two copies of contigs in the opposite pattern, ideally, there might be two major circles in the graph.
  5. Cut edges. The transitively inferable edges, non-branching stretches and alternative paths that go through the same node are removed. The edges across the two circles are also removed. All removed edges represent incorrect overlapping relationships, mainly due to repeated sequences, especially IR regions in plastid genomes.
  6. Extract the full circle found in the graph. According to the overlap information, the program merges contigs to generate circular sequences.
If the input contigs contain enough head-to-tail overlapping sequences, a whole plastid genome is likely to be formed and the program may generate two circular sequences with opposite orientation. Finally, the Validation module is called to test the output.
Results