2.1 HNF4α P1 and P2 subgroups of isoforms and their origins
Since their discovery, HNF4α isoforms have been referred by various
nomenclatures, mainly depending on the organism from which they were
isolated. This historical confusion contributed to a cumulative lack of
uniformity in their description from the literature as well as from main
repository databases, leading in some cases to significant discrepancies
between the matching of specific gene transcript sequences of some of
these isoforms [31]. To simplify and resolve confusion around the
classification of every single HNF4α isoforms, a recent nomenclature was
proposed to organize these isoforms accordingly to the specificity of
their N-terminal and C-terminal regions [31, 32]. From partial
sequencing of the HNF4A locus, six putative isoforms were
originally predicted being produced from alternative splicing [33].
This included the 455 amino acids rat protein, equivalent to the human
464 amino acids called HNF-4B, which was isolated and purified from rat
liver nuclear extracts [24], and later renamed HNF4α1 (P1a-α1,
Figure 1B). In addition, HNF4α2 (P1a-α2, Figure 1B) was initially
described in rat liver as an isoform containing an insertion of 10 amino
acids in the C-terminal region of the protein [34, 35]. This protein
was after that described as HNF-4CL4, a 474 amino acid protein isolated
from the human liver [36]. HNF4α3 (P1a-α3, Figure 1B) was first
described in the human liver as an isoform containing a sequence
insertion of 40 amino acids starting at position 369 and initially
called HNF4C [27]. HNF4α4 (P1b-α4, Figure 1B) included an additional
sequence of 30 amino acids in its N-terminal region as opposed to P1a-α1
to α3 isoforms [27]. More recently, an error was observed from the
initial reported sequence of P1b-α4 that predicted a premature stop
codon and a truncated protein. The correct sequence would have to
contain an alternative starting codon leading to the production of a
protein with a different sequence of the N-terminal region from what was
initially described [31]. HNF4α5 (P1b-α5, Figure 1B) and HNF4α6
(P1b-α6, Figure 1B) isoforms were deducted from the alternative splicing
mechanisms knowledge obtained from the first isolated isoforms. As
initially reported [33], these isoforms contain exons 1B and 1C in
their N-terminal region, while P1b-α5 contains the same insertion
described for P1a-α2, and P1b-α6, the same insertion described for
P1a-α3 isoform [33]. Subsequently, a gene transcript containing a
154 bp sequence variant in the N-terminal region and different from all
HNF4α isoforms described at that time, was isolated from immortalized
murine liver cells and named HNF4α7 (P2a-α7, Figure 1B) [37]. From
extrapolation of an additional combination of possible splicing events,
two additional isoforms were described: HNF4α8 (P2a-α8, Figure 1B) and
HNF4α9 (P2a-α9, Figure 1B), both containing an N-terminal region
identical to HNF4α7 (exon 1D, Figure 1B). The C-terminal regions of
these two isoforms differed, with HNF4α8 containing an identical
C-terminal region to HNF4α2 isoform and HNF4α9 C-terminal region being
identical to HNF4α3 isoform [38]. With the finding of an additional
P2 promoter located in the HNF4A locus [29], most subsequent
studies started distinguishing between P1 (α1 to α6) and P2 (α7 to α9)
isoforms. Three additional P2-driven isoforms (P2b-α10, P2b-α11, and
P2b-α12) were reported to include both exons 1D and 1E in their
N-terminal region (Figure 1B) [30]. The variable regions among these
isoforms were localized again in the C-terminal region, where HNF4α10
isoform contained the common C-terminal region of HNF4α1 and α7, HNF4α11
with the common C-terminal region of HNF4α2 and α8, and HNF4α12, the
same C-terminal region of HNF4α3 and α9 (Figure 1B).