2.1 HNF4α P1 and P2 subgroups of isoforms and their origins
Since their discovery, HNF4α isoforms have been referred by various nomenclatures, mainly depending on the organism from which they were isolated. This historical confusion contributed to a cumulative lack of uniformity in their description from the literature as well as from main repository databases, leading in some cases to significant discrepancies between the matching of specific gene transcript sequences of some of these isoforms [31]. To simplify and resolve confusion around the classification of every single HNF4α isoforms, a recent nomenclature was proposed to organize these isoforms accordingly to the specificity of their N-terminal and C-terminal regions [31, 32]. From partial sequencing of the HNF4A locus, six putative isoforms were originally predicted being produced from alternative splicing [33]. This included the 455 amino acids rat protein, equivalent to the human 464 amino acids called HNF-4B, which was isolated and purified from rat liver nuclear extracts [24], and later renamed HNF4α1 (P1a-α1, Figure 1B). In addition, HNF4α2 (P1a-α2, Figure 1B) was initially described in rat liver as an isoform containing an insertion of 10 amino acids in the C-terminal region of the protein [34, 35]. This protein was after that described as HNF-4CL4, a 474 amino acid protein isolated from the human liver [36]. HNF4α3 (P1a-α3, Figure 1B) was first described in the human liver as an isoform containing a sequence insertion of 40 amino acids starting at position 369 and initially called HNF4C [27]. HNF4α4 (P1b-α4, Figure 1B) included an additional sequence of 30 amino acids in its N-terminal region as opposed to P1a-α1 to α3 isoforms [27]. More recently, an error was observed from the initial reported sequence of P1b-α4 that predicted a premature stop codon and a truncated protein. The correct sequence would have to contain an alternative starting codon leading to the production of a protein with a different sequence of the N-terminal region from what was initially described [31]. HNF4α5 (P1b-α5, Figure 1B) and HNF4α6 (P1b-α6, Figure 1B) isoforms were deducted from the alternative splicing mechanisms knowledge obtained from the first isolated isoforms. As initially reported [33], these isoforms contain exons 1B and 1C in their N-terminal region, while P1b-α5 contains the same insertion described for P1a-α2, and P1b-α6, the same insertion described for P1a-α3 isoform [33]. Subsequently, a gene transcript containing a 154 bp sequence variant in the N-terminal region and different from all HNF4α isoforms described at that time, was isolated from immortalized murine liver cells and named HNF4α7 (P2a-α7, Figure 1B) [37]. From extrapolation of an additional combination of possible splicing events, two additional isoforms were described: HNF4α8 (P2a-α8, Figure 1B) and HNF4α9 (P2a-α9, Figure 1B), both containing an N-terminal region identical to HNF4α7 (exon 1D, Figure 1B). The C-terminal regions of these two isoforms differed, with HNF4α8 containing an identical C-terminal region to HNF4α2 isoform and HNF4α9 C-terminal region being identical to HNF4α3 isoform [38]. With the finding of an additional P2 promoter located in the HNF4A locus [29], most subsequent studies started distinguishing between P1 (α1 to α6) and P2 (α7 to α9) isoforms. Three additional P2-driven isoforms (P2b-α10, P2b-α11, and P2b-α12) were reported to include both exons 1D and 1E in their N-terminal region (Figure 1B) [30]. The variable regions among these isoforms were localized again in the C-terminal region, where HNF4α10 isoform contained the common C-terminal region of HNF4α1 and α7, HNF4α11 with the common C-terminal region of HNF4α2 and α8, and HNF4α12, the same C-terminal region of HNF4α3 and α9 (Figure 1B).