Product News Jul 04, 2008
MUSCA is a two-phase algorithm for computing the multiple sequence alignment of a set of N sequences. During the first phase, Teiresias used to discover motifs that are common among a subset of the N sequences. These motifs are used in the second phase to generate and report the multiple sequence alignment. In particular, the motifs are first mapped to vertices of a directed graph: if two motifs pi and pj do not occur simultaneously in any sequence then there is no edge connecting the corresponding vertices of the graph; the vertices corresponding to pi and pj will be connected by an edge with direction from pi to pj if pi occurs before pj in all the sequences where they both appear; the labels of the edges depend on three things: whether pi and pj are pair-wise incompatible, have overlapping instances, or are pair-wise compatible but do not overlap. Those vertices that are joined by incompatible edges or participate in inconsistent cycles form the basic non-feasible sets. After labeling the vertices of the reduced graph with the help of a simple cost function, greedy algorithm used to obtain a solution to a weighted set-cover problem that essentially identifies the minimum number of motifs/vertices to be removed. The resulting graph is used to determine blocks that involve overlapping feasible motifs. The final alignment obtained by properly aligning the blocks and padding up the existing gaps.