Remote homology detection of insulin and IR
Popular sequence alignment methods such as Blast are not suitable for
the detection of sequence conservation between coral and human proteins
because of the low level of conservation in the 20-30% sequence
identity range. The method of choice for retrieval of coral homologs of
human proteins was Hhblits. This is a so-called Hidden Markov Model
based alignment approach developed by Johannes Soeding in 2005
(Remmert et al., 2011).
Unlike traditional profile HMM’s, both query and template are HMM’s. The
query HMM is generated by using amino acid distributions. Thus, the
search for homologues is through an HMM-HMM alignment which makes this
method extremely sensitive. This has been shown in many instances where
hhblits has been able to successfully outperform the identification and
alignment of remote homologues, as compared to the traditional profile
HMM approach, such as HMMER3
(Remmert et al., 2011).
Because of the 700 million years of evolution between corals and humans,
this exquisite sensitivity of hhblits has been instrumental for the
present study. The results of the hhblits search in the pdam genome for
homologues of human insulin (uniprot ID P01308 ) and human IR
(uniprot ID P06213 ) are shown in the Supplementarymaterials with filenames matching the uniprot ids.
Shown in Figure 1 is the sequence alignment of human insulin
with pdam protein pdam_00013976. Similarly, IR was aligned with
pdam_00006633 with high confidence (data not shown). In both cases, the
alignments cover a large fraction of the sequence, 1164 out of 1382
amino acids in the case of IR and 101 out of 110 in the case of insulin.
Manual inspection of the sequence alignment also shows a clear matching
of similar sequences despite the low overall sequence identity. This was
not possible with regular profile HMM sequence alignments (data not
shown). To validate the alignment, we extracted known insulin residues
involved in binding and you can see that these map to regions of high
confidence alignment. Shown in different colors in Figure 1 are
various residue motifs identified to be important for receptor binding
by cryoelectron microscopy
(Uchikawa et al., 2019).
There is a good overlap between these functionally important motifs and
the regions of high confidence alignment (9=highest, 0=lowest). We did
the same for the insulin receptor and again found overlap with the
ligand binding residues and high confidence alignment (data not shown).
We also found conservation of the human disulfide bond between
Cys647-Cys860 connecting FnII-2a and FnIII-3 in the coral sequence. We
had identified this disulfide bond to be important for receptor
activation as a signaling bridge
(Ye et al., 2017), a
hypothesis that was validated by the recent structural analysis
(Uchikawa et al., 2019).