3. NON-CANONICAL CODING RNA MODIFICATIONS: DISTRIBUTION, DYNAMISM AND FUNCTION.
3.1. N6-Methyladenosine. N6-methyladenosine (m6A) is the most abundant internal modification detected in mammalian mRNAs (0.2%–0.6% of all adenosines) (Śledź and Jinek, 2016). Its abundance together with the development of robust detection methods led to an intense research interest, and nowadays, m6A is the best characterized RNA modification. It consists of the addition of a methyl group at the nitrogen-6 position of adenosine (Figure 1 ). The methyltransferase-like 3 (METTL3)–METTL14 heterodimer is involved in the methylation process, where METTL3 is the catalytic subunit and METTL14 acts as the RNA-binding scaffold for substrate recognition (Śledź and Jinek, 2016). Another m6A writer protein is METTL16, a U6 snRNA m6A methyltransferase. METTL16 is involved in the regulation of the cellular levels of S-adenosylmethionine (SAM), the methyl donor for methylation, as well as in the mRNA splicing process (Pendleton et al., 2017). Apart from passive m6A demethylation of the transcriptome, this modification is actively removed by the activity of the fat mass and obesity-associated protein (FTO) (Jia et al., 2011) and AlkB homologue 5 (ALKBH5) (Zheng et al., 2013) demethylases. FTO and ALKBH5 proteins are dioxygenases known to demethylate N-methylated nucleic acids. m6A readers have been also identified, included m6A-binding proteins belonging to the YTH family (YTHDF and YTHDC proteins) (Xiao et al., 2016), IGF2BP proteins (Huang et al., 2018), and some heterogeneous nuclear ribonucleoproteins (hnRNP) (Alarcón et al., 2015a).
Generally, m6A deposition on mRNA occurs in a sequence- dependent manner, mainly in the coding regions (CDS) and 3’ untranslated regions (UTR) with a significant enrichment just upstream of the stop codon (Dominissini et al., 2012; Meyer et al., 2012). Interestingly, it has been described that trimethylation of histone H3 at Lys36 (H3K36me3) influences m6A deposition into specific genomic sequences by recruiting METTL14 complex (Huang et al., 2019a). Chromatin immunoprecipitation (ChIP)-sequencing studies demonstrated that approximately 70% of m6A peaks overlapped with H3K36me3 sites (Huang et al., 2019a). Altogether, the association between histone H3K36me3 and m6A RNA methylation adds a new layer of complexity in the control of gene expression. An anticipated research scenario focused on the integration of epigenetic and epitranscriptomic signals to explain gene control is expected in the near future.
The wide range of readers could explain why m6A is involved in almost of aspects of postranscriptional gene regulation and mRNA life cycle, including mRNA stability, splicing and translation. For instance, the m6A readers YTHDF1 and YTHDF2 controls \soutthe mRNA stability during stem cell differentiation and modulates processes such as haematopoietic stem and progenitor cell specification (Zhang et al., 2017a; Li et al., 2018b), neural induction from induced pluripotency stem cells (Heck et al., 2020), mammalian spermatogenesis (Hsu et al., 2017) or circadian regulation of downstream genes involved in lipid metabolism (Zhong et al., 2018). By recognizing m6A on pre-mRNA, YTHDC1, hnRNPC, hnRNPG, and hnRNPA2B1 could also modulate mRNA splicing (Alarcón et al., 2015a; Liu et al., 2015; Xiao et al., 2016). YTHDC1 could also mediate nuclear export of processed RNAs into cytoplasm (Roundtree et al., 2017b). In addition to regulating RNA stability and splicing, m6A reader proteins, including YTHDF1, YTHDF3, IGF2BP1/2/3, YTHDC2, supervise the RNA translation process and RNA decay (Shi et al., 2017; Huang et al., 2018). Strikingly, the deposition of m6A in 3’ UTRs suggest that m6A could be incorporated into specific miRNA target sequences to modulate miRNA-binding (Alarcón et al., 2015b). And vice versa , it has been recently described that microRNAs regulate m6A modification via a sequence pairing mechanism and influences cell reprogramming in pluripotency (Chen et al., 2015). This finding reinforces the crosstalk between the epigenome and epitranscriptome in the control of gene regulation.
3.2. N1-Methyladenosine. The N1-methyladenosine modification (m1A), or the addition of a methyl group at the nitrogen-1 position of adenosine (Figure 1 ), was described decades ago to primarily affect all classes of RNAs (Barbieri and Kouzarides, 2020). It is predominant in tRNA and rRNA, but it was recently determined that it also exists in mRNA (Boccaletto et al., 2018). Nowadays, there is very little information of its frequency, the key players involved in m1A regulation and its consequences in mRNA. Although its frequency in cytosolic mRNA is controversial, it is accepted that m1A is less abundant than m6A (about ten times) (Dominissini et al., 2016; Safra et al., 2017). The m1A modification maps uniquely to GC-rich, 5’-UTRs positions in coding transcripts (Safra et al., 2017). An aspect of interest is that unlike m6A, m1A occurs in the Watson-Crick interface carrying a positively charged base at this position (Roundtree et al., 2017a). Alterations at protein-RNA interactions and RNA secondary/tertiary structures could be expected. The role of m1A modification is under elucidation, however, some recent works described a function in the initiation of mRNA translation (Dominissini et al., 2016; Li et al., 2016b) by facilitating non-canonical binding of the exon-exon junction complex at 5’ UTRs devoid of 5’ proximal introns (Cenik et al., 2017). Its role in the control of regulation is supported by its high conservation in mouse and human cells (Cenik et al., 2017).
The only known m1A writer of cytosolic mRNA is the TRM6-TRM61 complex, however, its activity does also cover m1A in the mitochondrial-encoded transcripts (Li et al., 2017a; Safra et al., 2017). m1A modification can be removed from mRNA by ALKBH3, a m1A demethylase both in mRNA and tRNA (Dominissini et al., 2016; Li et al., 2016b; Esteve-Puig et al., 2020). The YTH protein family of m6A readers could also interpret m1A signal. Specifically, YTHDF1-3 and YTHDC1 were shown to bind directly to m1A in mRNA in human cancer cells (Dai et al., 2018). New insights into the functions of m1A in RNA biology are needed; so far, only a role in the response to various types of cellular stress has been proposed (Dominissini et al., 2016; Li et al., 2016b).
3.3. 5-Methylcytosine . Like DNA, all types of RNA molecules can be methylated at carbon 5 of cytosine giving rise to 5-methylcytosine (m5C) (Figure 1 ) covering diverse functions depending on the RNA specie (Trixl and Lusser, 2019). The abundance of m5C in mRNA is under strong debate and discrepancies come from the technical difficulties to establish the transcriptome-mapping of m5C, mainly due to incomplete conversion of cytidine and m5C during bisulfite treatment. It is estimated that about 62-70% of \soutthe cytosine sites had low methylation levels (<20% methylation), while 8-10% of the sites were moderately or highly methylated (>40% methylation) (Huang et al., 2019b). The location of m5C modifications primarily maps to CDS, although an enrichment has been also observed in the 5’-UTR and the 3’-UTR regions (Huang et al., 2019b).
The writers of RNA m5C modifications in mammals include seven members of the NOL1/NOP2/SUN domain family member (NSUN) family (NSUN1-7), and DNA methyltransferase-like 2 (DNMT2). However, so far only NSUN2 has been proved to methylate mRNA (Yang et al., 2017b). In this regard, only overexpression/suppression of NSUN2 but not of any other NSUN enzyme, affected overall m5C levels in mRNA from HeLa cells (Yang et al., 2017b). Regrettably, enzymes that remove 5mC from RNA species have not yet been identified.
As we are only beginning to uncover the biology of m5C in mRNA, not much is known about the potential functional consequences. A role for m5C in the regulation of nuclear export has been discovered (Yang et al., 2017b). Specifically, the activity of the nuclear export factor ALYREF/THOC4 is strongly affected by the m5C level of its target mRNAs (Yang et al., 2017b). The 5mC deposition is not a random event since 5mC accumulates at translational start codon and in a CG sequence context. In addition, m5C can act as a modulator of protein translation. Examples include the m5C accumulation at 5’UTR of cyclin-dependent kinase inhibitor p27KIP1 during replicative senescence (Tang et al., 2015), or m5C deposition in the 3’ UTRs of the cell cycle regulators CDK1 and p21 during the cell division cycle (Xing et al., 2015).
Physiologically, NSUN2 is enrolled in multiple biological pathways. It has been identified as a direct target gene of the transcription factor Myc and its activation is relevant for the differentiation of primary human keratinocytes (Frye and Watt, 2006). Mice models consisting ofNsun2 knockdown exhibit additional development defects, such as impaired cerebral cortex organization, immature skeleton, among others (Tuorto et al., 2012) . Nsun2 was also implicated in testis differentiation (Hussain et al., 2013). The molecular mechanisms connecting NSUN2 deficiencies and impaired cell differentiation were not identified.
3.4. Pseudouridine. Pseudouridylation is the isomerization of the uridine base via breakage of the glycosidic bond, 180° base-rotation, and bond reformation (Hamma and Ferré-D’Amaré, 2006) (Figure 1 ). It is the most frequent modification in total human RNA; however, the mapping of pseudouridine (ψ) in mRNAs was recently addressed (Penzo et al., 2017). Methodological limitations introduce serious controversy on the distribution and abundance of ψ, but the general consensus is that ψ sites in mRNA are much less abundant than m6A (Schwartz et al., 2014). Besides mRNAs, non-coding RNAs (ncRNAs) have emerged as highly interesting targets with ψ sites (Rintala-Dempsey and Kothe, 2017). The enzymology associated with pseudouridylation is very complex. In eukaryotes, uridine is transformed into ψ by a class of enzymes known as pseudouridylases. Pseudouridylases are represented in humans by pseudouridine synthases (PUS) encoded by 13 genes. Human PUS enzymes are far less studied than their counterparts in other organisms but recent discoveries allow a better identification of PUS enzymes, including those acting on mRNA (PUS1, PUS3, PUS4, PUS6, PUS7 and PUS9) (Penzo et al., 2017). Their mode of action or potential redundancy in their functions has not yet been completely resolved (Carlile et al., 2014a; Penzo et al., 2017). Currently, any specific eraser or reader associated with ψ modifications have been identified (Barbieri and Kouzarides, 2020).
It is well known that ψ enhances the function of tRNA and rRNA by stabilizing the RNA structure as well as regulating the splicing process by modifying specific snRNAs (Carlile et al., 2014b; Barbieri and Kouzarides, 2020). The physiological relevance of ψ in mRNA is more unclear with only a few evidences of its role. Mutations in genes encoding human PUS enzymes cause inherited diseases affecting muscle and brain function which reinforced their emerging role as regulators of gene expression (Shaheen et al., 2019). Notably, ψ content in 3’UTR mRNA is regulated in response to environmental signals, such as serum starvation in human cells, suggesting a function in the flexible adaptation of the genetic code through inducible mRNA modifications (Carlile et al., 2014b). A role in mRNA translation throughout the control of ribosome pausing and RNA localization has been also suggested (Carlile et al., 2014b; Schwartz et al., 2014).
3.5. Adenosine-to-inosine editing. Another RNA modification in mammals is the irreversible deamination of adenosine to inosine, a process also known as A-to-I editing (Figure 1 ). A-to-I editing occurs in multiple genomic sequences, ranging from coding regions of mRNAs to non-coding regions (e.g., Alu repeats, pre-miRNAs or pri-miRNAs) (Nishikura, 2016a). Inosine is interpreted at cellular level like a guanine and, consequently, A-to I editing could alter the biogenesis and/or function of miRNAs or mRNAs as well as proteins (Nishikura, 2016b). However, a comparative study among animal A-to-I modifications revealed that non-coding parts of the genome were the main targets for the editing process. A role in protecting against activation of innate immunity by self-transcripts have been proposed (Eisenberg and Levanon, 2018). A second type of A-to-I editing is hyper-editing, which could be understood as \soutan editing enriched regions (Porath et al., 2014). A large proportion of adenosines in close proximity to each other within the same transcript is a requisite for hyper-editing. In mammals, this class of editing is mostly associated with regions of repetitive sequences, intronic regions and 3′ UTRs (Porath et al., 2017).
A-to-I edition is catalysed by \soutthe adenosine deaminase acting on dsRNA family of proteins, ADAR. ADAR1 and ADAR2 are the catalytically active proteins, whereas ADAR3 lacks editing activity and may act as a negative regulator of ADAR1 and ADAR2 activity (Nishikura, 2016b). Both ADAR1 and ADAR2 proteins have essential roles in cellular differentiation. In mammals, ADAR1 is widely expressed, especially in the myeloid component of the blood system, and plays a prominent role in promiscuous editing of long dsRNA (Zipeto et al., 2016). Additional studies indicate that ADAR1 forms a complex with Dicer to promote miRNA processing (Ota et al., 2013). ADAR2 has a higher expression in brain and is primarily required for site-specific editing of key transcripts for central nervous system development (Behm et al., 2017). A role for ADAR2 in the control of the circadian clock has been revealed (Terajima et al., 2017).