Curated genome annotation of Oryza sativa ssp. japonica and comparative genome analysis with Arabidopsis thaliana.
Rice Annotation Project None., Itoh T., Tanaka T., Barrero RA., Yamasaki C., Fujii Y., Hilton PB., Antonio BA., Aono H., Apweiler R., Bruskiewich R., Bureau T., Burr F., Costa de Oliveira A., Fuks G., Habara T., Haberer G., Han B., Harada E., Hiraki AT., Hirochika H., Hoen D., Hokari H., Hosokawa S., Hsing Y-I., Ikawa H., Ikeo K., Imanishi T., Ito Y., Jaiswal P., Kanno M., Kawahara Y., Kawamura T., Kawashima H., Khurana JP., Kikuchi S., Komatsu S., Koyanagi KO., Kubooka H., Lieberherr D., Lin Y-C., Lonsdale D., Matsumoto T., Matsuya A., McCombie WR., Messing J., Miyao A., Mulder N., Nagamura Y., Nam J., Namiki N., Numa H., Nurimoto S., O'Donovan C., Ohyanagi H., Okido T., Oota S., Osato N., Palmer LE., Quetier F., Raghuvanshi S., Saichi N., Sakai H., Sakai Y., Sakata K., Sakurai T., Sato F., Sato Y., Schoof H., Seki M., Shibata M., Shimizu Y., Shinozaki K., Shinso Y., Singh NK., Smith-White B., Takeda J-I., Tanino M., Tatusova T., Thongjuea S., Todokoro F., Tsugane M., Tyagi AK., Vanavichit A., Wang A., Wing RA., Yamaguchi K., Yamamoto M., Yamamoto N., Yu Y., Zhang H., Zhao Q., Higo K., Burr B., Gojobori T., Sasaki T.
We present here the annotation of the complete genome of rice Oryza sativa L. ssp. japonica cultivar Nipponbare. All functional annotations for proteins and non-protein-coding RNA (npRNA) candidates were manually curated. Functions were identified or inferred in 19,969 (70%) of the proteins, and 131 possible npRNAs (including 58 antisense transcripts) were found. Almost 5000 annotated protein-coding genes were found to be disrupted in insertional mutant lines, which will accelerate future experimental validation of the annotations. The rice loci were determined by using cDNA sequences obtained from rice and other representative cereals. Our conservative estimate based on these loci and an extrapolation suggested that the gene number of rice is approximately 32,000, which is smaller than previous estimates. We conducted comparative analyses between rice and Arabidopsis thaliana and found that both genomes possessed several lineage-specific genes, which might account for the observed differences between these species, while they had similar sets of predicted functional domains among the protein sequences. A system to control translational efficiency seems to be conserved across large evolutionary distances. Moreover, the evolutionary process of protein-coding genes was examined. Our results suggest that natural selection may have played a role for duplicated genes in both species, so that duplication was suppressed or favored in a manner that depended on the function of a gene.