Chinese Chemical Letters  2018, Vol. 29 Issue (7): 1029-1032   PDF    
Distribution of micropeptide-coding sORFs in transcripts
Xinqiang Yina,b, Jialiang Hua,c, Hanmei Xua,c    
a The Engineering Research Center of Peptide Drug Discovery and Development, China Pharmaceutical University, Nanjing 210009, China;
b The Basic Medical School, North Sichuan Medical College, Nanchong 673000, China;
c State Key Laboratory of Natural Medicines, Ministry of Education, China Pharmaceutical University, Nanjing 210009, China
Abstract: Small open reading frames (sORFs) are distributed over a wide variety of transcripts.sORFs encoding functional peptides have been identified in various configurations within apparently long noncoding RNAs. Many translated sORFs have been identified across mRNAs, including 5'-upstream, coding domain, and 3'-downstream. sORFs have also been found in circular RNAs, pri-miRNAs, and ribosomal RNAs. Here, we present an overview of the wide distribution of the sORFs in transcripts and their functional roles in organisms.
Key words: Micropeptide     sORFs     uORFs     ncRNA     circRNAs     Pri-miRNAs    
1. Introduction

A protein-codingopen readingframe (ORF)is comprised of a start codon, in-frame codons and a stop codon [1, 2]. Peptides, which are typically defined as fewer than 50 amino acids, are often obtained from the processing of longer precursors. However, hundreds of thousands of previously non-annotated short or small open reading frames (sORFs) of less than 100 codons, which have the potential to encode peptides or small proteins, have been discovered in genomes of many species. Products encoded by sORFs, less than 100 amino acids, are named micropeptides [3, 4]. Unlike classical bioactive peptides, micropeptides are immediately released in the cytoplasm because of the absence of an N-terminal signaling sequence [5].

For many reasons, micropeptides have been missed for a long time. On the one hand, short coding sequences were excluded from the initial genome annotation strategy for the assumption that most of coding genes would code for longer than 100 amino acids proteins and therefore it is difficult to accurately identify bona fide proteincoding sORFs and distinguish them from the large number of putative noncoding ORFs. On the other hand, it's difficult to detect them for their small size and lower abundance. Furthermore, the use of alternative transcription start sites and process such as alternative splicing, transcript editing, and post-translational modification make the identification process even more challenging [6-8].

Advanced bioinformatics and computational approaches have been successfully implemented to identify sORFs-encoded peptides [9-18]. In addition, some experimental techniques, such as ribosome profile[19-26], mass spectrometry and other proteomic methods [27-32], have been developed and fine-tuned to identify novel coding sORFs. Here, we present an overview of the wide distribution of the sORFs in transcripts (Fig. 1) and their functional roles in organisms. Since the aforementioned methods have been described in detail in many articles [9-32], we will not explore them in this review.

Fig. 1. Overview of the distribution of small open reading frames (sORFs) in various transcripts

2. The location of sORFs in transcripts 2.1. sORFs in annotated long non-coding RNAs

Based on the sequence length, non-coding RNAs (ncRNAs) can be generally divided into two classes: short ncRNAs (sncRNAs) with size of less than 200 nts and long ncRNAs (lncRNAs) with size of more than 200 nts. Advances in transcriptomics have led to the discovery of a great number of lncRNAs in genomes which play versatile roles in regulating gene expression [33].

Although originally thought to be non-coding, recently studies have found that there are many sORFs in ncRNAs, and some ncRNAs even can encode micropeptides. Gene annotated as noncoding RNA 003 in 2L (pncr003:2L) encodes two functional micropeptides with 28 and 29 amino acids named SCL (Table 1). SCL regulates calcium transport and hence influences regular muscle contraction in the Drosophila heart [34].

Table 1
The sequences of the peptides discussed in the text

Olson's team has found five micropeptides in mice transcripts. Two of them are encoded by lncRNAs. A highly conserved 46 amino acids micropeptides named myoregulin (MLN) (Table 1), which is encoded by a skeletal muscle-specific RNA annotated as lncRNA, interacts directly with sarcoplasmic reticulum Ca2+-ATPase (SERCA) and impedes Ca2+ uptake into the sarcoplasmic reticulum hence regulates muscle contraction [35]. Another 34 amino acids micropeptide DWORF, encoded by a putative muscle-specific long noncoding RNA, enhances SERCA activity by displacing the SERCA inhibitors, phospholamban (PLN), sarcolipin (SLN), and myoregulin (MLN) [36].

Toddler, a gene previously annotated as non-coding RNA in vertebrates, encodes a 58 amino acids micropeptide which is an activator of APJ/Apelin receptor signaling and promotes gastrulation movements [37]. LncRNA LINC00961, conserved across species, encodes the 90-amino acid polypeptide SPAR, which regulates mTORC1 activation and promotes muscle regeneration [38]. A recent study showed that NoBody, a conserved micropeptide encoded by LINC01420/LOC550643 RNA, interacts with mRNA decapping protein via direct interactions with EDC4 [39]. These examples underscore the likelihood that many transcripts currently annotated as noncoding RNAs encode peptides with important biological functions.

2.2. sORFs in mRNAs 2.2.1. 50-UTRs

Small ORFs present in the 50 untranslated region of mRNAs are named upstream ORFs or uORFs. It's a big challenge to predict uORFs by using sequence-based methods, since nearly half of these uORFs use non-AUG in mammals. But ribosome profiling, a powerful technique, can detect various start codons directly by halting the ribosome in the start site [40, 41]. For a long time, these uORFs were considered to be cis-acting elements regulating the translation of downstream ORFs [42]. Recent studies have demonstrated that nearly 50% of uORFs in human mRNAs are translated and translation is necessary to regulate downstream ORF expression [43, 44]. In general, uORFs can reduce protein expression of downstream through modulating translation efficacy [45] or by triggering mRNA decay [46, 47]. Under stress conditions, however, uORFs facilitate protein expression [48].

Some uORFs-encoded peptides also have biological functions. An uORF containing 31 codons in the mRNA for the mammalian gene chop can encode a 31-amino acid peptide. This uORF peptide reduces CHOP protein translation through interacting with the peptide exit tunnel on the ribosome to pause or disassociate the ribosome from the mRNA thereby disturbing the expression of the chop gene [49]. Another example is the MKKS gene, which generates two types of transcripts: a long transcript that encodes both uORFs and MKKS, and a short transcript that encodes only uORFs by using alternative polyadenylation sites at the 50-UTR. Multiple uORFs of the MKKS long transcript function as translational repressor for MKKS. Two encoded products of uORFs are imported onto the mitochondrial membrane, but their function needs further study [50]. One more example is a 50-upstream short open reading frame encoded peptide, which regulates angiotensin type 1a receptor production and signals via the β-arrestin pathway [51]. All the examples suggest that uORFs may treasure a source of peptides that play key roles in cells.

2.2.2. Overlapping and downstream sORFs

Mature mRNAs contain unconventional open reading frames also located in overlapping the reference ORFs in non-canonical +2 and +3 reading frames and thus a single mRNA can yield more than one completely different peptides [52, 53]. Around 41% of human mRNAs contain at least one alternative ORF, most of which encode small proteins of less than 90 amino acids, within the reference ORF [52]. Overlapping sORFs may lie within a known ORF or extend from the known ORF into the 30 trailer sequence. These overlapping sORFs represent another source of alternatively translated products. Eighty short peptides that are encoded by overlapping sORFs have been identified by proteomic studies [52, 54]. Two characterized polypeptides, AltPrP [55] and AltATXN1 [56], are also encoded by overlapping sORFs.

Compared to 50-UTRs, 30-UTRs seem to attract less attention, since people think that 30-UTRs cannot be translated. However, the translation of peptides from downstream sORFs is supported by mass spectrometry and other algorithms and some peptides from downstream sORFs have already been identified by several studies [52, 54]. One study has revealed that AltMRVI1, encoded by a sORF in the 30-UTR of the gene MRVI1, co-localizes and interacts with BRCA1, but its role is still unknown [52].

2.3. sORFs in circular RNAs

Circular RNAs (circRNAs) are produced through a non-canonical alternative splicing and form covalently closed RNA circles [57]. For lacking the structures that are critical for efficient translation initiation, people think that circRNAs are not protein encoding. They are conserved across species and enriched in the nervous system [58]. Many studies have suggested that this new class of RNAs function in a wide range, including mediating mRNA expression, protein sequestration, transcriptional regulation, and have potential roles in some diseases [57-62].

However, experimental evidence reveals that circRNAs also have translation potential and even few functional products encoded by circRNAs have been identified (Fig. 2). One example is the translation of circMbl [63]. A subset of translating ribosomesassociated circRNAs have been identified by performing ribosome footprinting from fly heads. CircMbl3, a protein encoded by a circRNA generated from the gene muscleblind (Mbl), was detected by mass spectrometry. Further study showed that ribo-circRNAs allow cap-independent translation and that starvation and FOXO likely regulate the translation of a circMbl isoform. The identifiable domains in many ribo-circRNAs-encoded proteins indicate their functions. Another example is the translation of circ-ZNF609 [64]. Circ-ZNF609, which contains an open reading frame spanning from the start codon with the linear transcript, and terminating at an inframe stop codon, and controls myoblast proliferation, is associated with heavy polysomes, and encodes a protein in splicing-dependent and cap-independent manner [64].

Fig. 2. Translation of circular RNAs

2.4. Translation of pri-micro RNAs and ribosomal RNAs

As small regulatory RNA molecules, miRNAs can inhibit the expression of specific target genes through binding to and cleaving their mRNAs or otherwise inhibiting their translation into proteins. Since primary transcripts (pri-miRNAs) are the precursor of miRNAs and have the same feature as mRNAs produced by Pol Ⅱ, it is possible that they also encode proteins. Pri-miR171b and primiR165a containing sORFs can encode regulatory peptides, miPEP171b and miPEP 165a, respectively [65]. Both peptides can enhance the accumulation of their corresponding mature miRNAs and lead to the down regulation of target genes involved in root development [65]. Five other active miPEPs encoded by primiRNAs of A. thaliana and M. truncatula have been found, which suggests that miPEPs are widespread in plants [65]. But whether sORFs present in animals is still unknown.

Two studies have identified that mitochondrial ribosomal RNAs encode functional peptides and confirmed that mitochondrial ribosomal RNAs have coding potential [66, 67]. Humanin, a 24- amino-acids polypeptide that is highly conserved across species, was found to be encoded in mitochondrial 16S rRNA [66]. This peptide functions in a variety of biological processes such as cell survival, apoptosis, inflammatory response, substrate metabolism, oxidative stress, and starvation [66, 68-70]. Another mitochondrial-derived peptide is MOTS-c that is encoded by mitochondrial rRNA. It can promote metabolic homeostasis and reduce obesity and insulin resistance [67].

3. Perspective

sORFs have been found in various transcripts, and some sORFencoded functional peptides have also been identified in what seem to be non-coding transcripts in several organisms. Since ribosome profilings have demonstrated the coding potential of thousands of previously annotated as non-coding RNAs, these functional peptides could be just the tip of the iceberg. It is time to pay special attention to the new source of peptides. It is a big challenge for scientists to study small peptides to discover and characterize all the short peptides. Are there any sORFs in other RNAs? How many sORFs are actually translated? What are the functions of these small peptides? These questions remain to be answered.


This work was supported by the Project Program of State Key Laboratory of Natural Medicines (No. SKLNMBZ201403) and the National Science and Technology Major Projects of New Drugs (Nos. 2012ZX09103301-004 and 2014ZX09508007) in China. This project was also funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD).

M.A. Basrai, P. Hieter, J.D. Boeke, Genome Res. 7 (1997) 768-771. DOI:10.1101/gr.7.8.768
M.A. Mumtaz, J.P. Couso, Biochem. Soc. Trans. 43 (2015) 1271-1276. DOI:10.1042/BST20150170
M.I. Galindo, J.I. Pueyo, S. Fouix, S.A. Bishop, J.P. Couso, PLoS Biol. 5 (2007) e106. DOI:10.1371/journal.pbio.0050106
Y. Hashimoto, T. Kondo, Y. Kageyama, Dev. Growth Differ. 50 (2008) S269-S276. DOI:10.1111/j.1440-169X.2008.00994.x
J. Crappe, W.V. Criekinge, G. Menschaert, EuPA Open Proteom. 3 (2014) 128-137. DOI:10.1016/j.euprot.2014.02.006
I.P. Ivanov, A.E. Firth, A.M. Michel, et al., Nucleic Acids Res. 39 (2011) 4220-4234. DOI:10.1093/nar/gkr007
G. Menschaert, W. Van Criekinge, T. Notelaers, et al., Mol. Cell Proteom. 12 (2013) 1780-1790. DOI:10.1074/mcp.M113.027540
S.J. Andrews, J.A. Rothnagel, Nat. Rev. Genet. 15 (2014) 193-204. DOI:10.1038/nrg3520
L. Kong, Y. Zhang, Z.Q. Ye, et al., Nucleic Acids Res. 35 (2007) W345-W349. DOI:10.1093/nar/gkm391
L.D. Hurst, Trends Genet. 18 (2002) 486. DOI:10.1016/S0168-9525(02)02722-1
M.F. Lin, J.W. Carlson, M.A. Crosby, et al., Genome Res. 17 (2007) 1823-1836. DOI:10.1101/gr.6679507
A. Stark, M.F. Lin, P. Kheradpour, et al., Nature 450 (2007) 219-232. DOI:10.1038/nature06340
G. Butler, M.D. Rasmussen, M.F. Lin, et al., Nature 459 (2007) 657-662.
M. Clamp, B. Fry, M. Kamal, et al., Proc. Natl. Acad. Sci. U. S. A. 104 (2007) 19428-19433. DOI:10.1073/pnas.0709013104
M. Guttman, I. Amit, M. Garber, et al., Nature 458 (2009) 223-227. DOI:10.1038/nature07672
M. Guttman, M. Garber, J.Z. Levin, et al., Nat. Biotechnol. 28 (2010) 503-510. DOI:10.1038/nbt.1633
M.F. Lin, I. Jungreis, M. Kellis, Bioinformatics 27 (2011) i275-i282. DOI:10.1093/bioinformatics/btr209
N.T. Ingolia, S. Ghaemmaghami, J.R. Newman, et al., Science 324 (2009) 218-223. DOI:10.1126/science.1168978
S. Lee, B. Liu, S.X. Huang, et al., Proc. Natl. Acad. Sci. U. S. A. 109 (2012) E2424-E2432. DOI:10.1073/pnas.1207846109
N.T. Ingolia, G.A. Brar, S. Rouskin, et al., Nat. Protoc. 7 (2012) 1534-1550. DOI:10.1038/nprot.2012.086
S. Iwasaki, N.T. Ingolia, Trends Biochem. Sci. 42 (2017) 612-624. DOI:10.1016/j.tibs.2017.05.004
M.V. Gerashchenko, V.N. Gladyshev, Nucleic Acids Res. 45 (2017) e6. DOI:10.1093/nar/gkw822
M. Guttman, P. Russell, N.T. Ingolia, et al., Cell 154 (2013) 240-251. DOI:10.1016/j.cell.2013.06.009
N.T. Ingolia, G.A. Brar, N. Stern-Ginossar, et al., Cell Rep. 8 (2014) 1365-1379. DOI:10.1016/j.celrep.2014.07.045
A.A. Bazzini, T.G. Johnstone, R. Christiano, et al., EMBO J. 33 (2014) 981-993. DOI:10.1002/embj.201488411
J. Crappé, E. Ndah, A. Koch, et al., Nucleic Acids Res. 43 (2015) e29. DOI:10.1093/nar/gku1283
L. Calviello, N. Mukherjee, E. Wyler, et al., Nat. Methods 13 (2016) 165-170. DOI:10.1038/nmeth.3688
J.L. Aspden, Y.C. Eyre-Walker, R.J. Phillips, et al., Elife 3 (2014) e03528.
S.A. Slavoff, A.J. Mitchell, A.G. Schwaid, et al., Nat. Chem. Biol. 9 (2013) 59-64. DOI:10.1038/nchembio.1120
Q. Chu, J. Ma, A. Saghatelian, Crit. Rev. Biochem. Mol. Biol. 50 (2015) 134-141. DOI:10.3109/10409238.2015.1016215
J.A. Vizcaino, A. Csordas, N. Del-Toro, et al., Nucleic Acids Res. 44 (2016) 11033. DOI:10.1093/nar/gkw880
T.T. Cech, J.A. Steitz, Cell 157 (2014) 77-94. DOI:10.1016/j.cell.2014.03.008
E.G. Magny, J.I. Pueyo, F.M. Pearl, et al., Science 341 (2013) 1116-1120. DOI:10.1126/science.1238802
D.M. Anderson, K.M. Anderson, C.L. Chang, et al., Cell 160 (2015) 595-606. DOI:10.1016/j.cell.2015.01.009
B.R. Nelson, C.A. Makarewich, D.M. Anderson, et al., Science 351 (2016) 271-275. DOI:10.1126/science.aad4076
A. Pauli, M.L. Norris, E. Valen, et al., Science 343 (2014) 1248636. DOI:10.1126/science.1248636
G. Menschaert, W. Van Criekinge, T. Notelaers, et al., Mol. Cell. Proteom. 12 (2013) 1780-1790. DOI:10.1074/mcp.M113.027540
N.G. D'Lima, J. Ma, L. Winkler, et al., Nat. Chem. Biol. 13 (2017) 174-180. DOI:10.1038/nchembio.2249
A.M. Michel, D.E. Andreev, P.V. Baranov, BMC Bioinform. 15 (2014) 380. DOI:10.1186/s12859-014-0380-4
A. Matsumoto, A. Pasut, M. Matsumoto, et al., Nature 541 (2017) 228-232. DOI:10.1038/nature21034
S.E. Calvo, D.J. Pagliarini, V.K. Mootha, PNAS 106 (2009) 7507-7512. DOI:10.1073/pnas.0810916106
L.E. Cabrera-Quio, S. Herberg, A. Pauli, RNA Biol. 13 (2016) 1051-1059. DOI:10.1080/15476286.2016.1218589
Y. Ye, Y. Liang, Q. Yu, et al., Hum. Genet. 134 (2015) 605-612. DOI:10.1007/s00439-015-1544-7
S.E. Calvo, D.J. Pagliarini, V.K. Mootha, PNAS 106 (2009) 7507-7512. DOI:10.1073/pnas.0810916106
J.T. Mendell, N.A. Sharifi, J.L. Meyers, et al., Nat. Genet. 36 (2004) 1073-1078. DOI:10.1038/ng1429
H. Yepiskoposyan, F. Aeschimann, D. Nilsson, et al., RNA 17 (2011) 2108-2118. DOI:10.1261/rna.030247.111
K.A. Spriggs, M. Bushell, A.E. Willis, Mol. Cell 40 (2010) 228-237. DOI:10.1016/j.molcel.2010.09.028
C. Jousse, et al., Nucleic Acids Res. 29 (2001) 4341-4351. DOI:10.1093/nar/29.21.4341
C. Akimoto, E. Sakashita, K. Kasashima, et al., Biochim. Biophs. Acta 1830 (2013) 2728-2738. DOI:10.1016/j.bbagen.2012.12.010
G.L. Yosten, J. Liu, H. Ji, et al., J. Physiol. 594 (2016) 1601-1605. DOI:10.1113/JP270567
B. Vanderperre, J.F. Lucier, C. Bissonnette, et al., PLoS One 8 (2013) e70698. DOI:10.1371/journal.pone.0070698
H. Mouilleron, V. Delcourt, X. Roucou, Nucleic Acids Res. 44 (2016) 14-23. DOI:10.1093/nar/gkv1218
S.A. Slavoff, et al., Nature Chem. Biol. 9 (2013) 59-64. DOI:10.1038/nchembio.1120
B. Vanderperre, et al., FASEB J. 25 (2011) 2373-2386. DOI:10.1096/fj.10-173815
D. Bergeron, et al., J. Biol. Chem. 288 (2013) 21824-21835. DOI:10.1074/jbc.M113.472654
L.J. Li, Q. Huang, H.F. Pan, et al., Exp. Cell Res. 346 (2016) 248-254. DOI:10.1016/j.yexcr.2016.07.021
D. van Rossum, B.M. Verheijen, R.J. Pasterkamp, Front. Mol. Neurosci. 9 (2016) 74.
M. Cortés-López, P. Miura, Yale J. Biol. Med. 89 (2016) 527-537.
D. Rong, H. Sun, Z. Li, et al., Oncotarget 8 (2017) 73271-73281.
S. Qu, Z. Liu, X. Yang, Cancer Lett. 414 (2018) 301-309. DOI:10.1016/j.canlet.2017.11.022
M.M. Jiang, Z.T. Mai, S.Z. Wan, et al., J. Cancer Res. Clin. Oncol. 144 (2018) 667-674. DOI:10.1007/s00432-017-2576-2
N.R. Pamudurti, O. Bartok, M. Jens, et al., Mol. Cell 66 (2017) 9-21. DOI:10.1016/j.molcel.2017.02.021
I. Legnini, G. Di Timoteo, F. Rossi, et al., Mol. Cell 66 (2017) 22-37. DOI:10.1016/j.molcel.2017.02.017
D. Lauressergues, J.M. Couzigou, H.S. Clemente, et al., Nature 520 (2015) 90-93. DOI:10.1038/nature14346
Y. Hashimoto, T. Niikura, H. Tajima, et al., Proc. Natl. Acad. Sci. U. S. A. 98 (2001) 6336-6341. DOI:10.1073/pnas.101133498
C. Lee, J. Zeng, G.B. Drew, et al., Cell Metab. 21 (2015) 443-454. DOI:10.1016/j.cmet.2015.02.009
B. Guo, D. Zhai, E. Cabezas, et al., Nature 423 (2003) 456-461. DOI:10.1038/nature01627
D. Zhai, F. Luciano, X. Zhu, et al., J. Biol. Chem. 280 (2005) 15815-15824. DOI:10.1074/jbc.M411902200
C. Lee, K. Yen, P. Cohen, et al., Trends Endocrinol. Metab. 24 (2013) 222-228. DOI:10.1016/j.tem.2013.01.005