Deep sequencing of short capped RNAs reveals novel families of noncoding RNAs.
De Hoon MJL., Bonetti A., Plessy C., Ando Y., Hon C-C., Ishizu Y., Itoh M., Katoh S., Lin D., Maekawa S., Murata M., Nishiyori H., Shin JW., Stolte J., Suzuki AM., Tagami M., Takahashi H., Thongjuea S., Forrest AR., Hayashizaki Y., Kere J., Carninci P.
In eukaryotes, capped RNAs include long transcripts such as messenger RNAs and long noncoding RNAs, as well as shorter transcripts such as spliceosomal RNAs, small nucleolar RNAs, and enhancer RNAs. Long capped transcripts can be profiled using Cap Analysis Gene Expression (CAGE) sequencing and other methods. Here, we describe a sequencing library preparation protocol for short capped RNAs, apply it to a differentiation time course of the human cell line THP-1, and systematically compare the landscape of short capped RNAs to that of long capped RNAs. Transcription initiation peaks associated with genes in the sense direction had a strong preference to produce either long or short capped RNAs, with 1 out of 6 peaks detected in the short capped RNA libraries only. Gene-associated short capped RNAs had highly specific 3' ends, typically overlapping splice sites. Enhancers also preferentially generated either short or long capped RNAs, with 10% of enhancers observed in the short capped RNA libraries only. Both enhancers producing short or long capped RNAs showed enrichment for GWAS-associated disease SNPs. We conclude that deep sequencing of short capped RNAs reveals new families of noncoding RNAs and elucidates the diversity of transcripts generated at known and novel promoters and enhancers.