Figure 6. (A) Target T1145 as split into two EUs.(B) Grishin plots for four original domains of T1145 as marked in panel A. The upper left panel in section (B) shows that domains 1 and 2 should be split, while domains 2, 3 and 4 (the remaining 3 panels) should be joined.
The last example is target T1169, a mosquito salivary protein SGS1 involved in mosquito-borne diseases28. It is the largest monomeric target in the history of CASP (3364 residues in the sequence; 2735 residues resolved in the structure). It has a cocoon-shaped structure with multiple domains and extensive inter-domain interactions (Figure 7), thus presenting a significant challenge in defining EUs. The top-ranked SWORD/SWORD2 splitting schema suggested 7 domains; the domain definition from the authors (Figure 7B28) and the results of HHsearch homology searches (Figure 7C) offered additional help in defining domains. Domains were originally defined so that the following 7 areas were separated: the N-term β-propeller (blue in panel A, orange in panel B), region between the two β-propellers (HHsearch), β-propeller 2, region after the beta-propeller, CBM domain, lectin-CRD domain, the area containing the wedge domain up to the TM domain (HHsearch). The Grishin plot analysis suggested merging of two domains surrounding β-propeller 2, and merging of CBM, lectin-CRD and wedge-containing domains. In the end we split T1169 into four evaluation units, as colored in Figure 7A. A long linker between D1 and D4 and orphan helices in the middle of the cocoon (grey) were not assigned to any of the EUs.