The first portal website in Japan that aggregates various life science databases in RDF format* has been launched by the Japan Science and Technology Agency (JST) (President: Michinari Hamaguchi) and the Research Organization of Information and Systems (President: Genshiro Kitagawa) (Figure 1). This portal site provides RDF datasets deposited by various research organizations, and allows the user to browse the description of those datasets, to download the data and to search the data using SPARQL (a standard query language for RDF data) queries.
A wide variety of life science databases exist, but their diverse terminologies and data formats have hindered the integrative utilizations of them. To address the problem, RDF, which facilitates the interoperability and automated data processing, has been adopted as a new data format around the world, including in Japan (Figure 2).
The National Bioscience Database Center (NBDC) of JST and Database Center for Life Science (DBCLS) of the Research Organization of Information and Systems also have been encouraging the research groups developing databases in Japan and around the world to adopt the RDF format, and have been building a portal site.
The portal site already provides an initial collection of ten RDF datasets (Table 1). Six or more RDF datasets are planned to be added soon. Contemporarily, major life science database centers in the U.S. and Europe started providing their data in the RDF format. The specialty of our portal site is to aggregate a large variety of databases from multiple organizations and thus helping collaboration of researchers from a broad range of domains.
Prior to the launch of the portal site, DBCLS developed and released guidelines for the generation of high quality RDF data. All the RDF datasets provided by the portal site conform to the guidelines, and have met the criteria to make them interoperable.
The RDF datasets provided by the portal are expected to be easily integratable with other RDF datasets available around the world, which will reduce the costs for data handling. For example, without such an RDF portal, when searching for potential drug candidates from distributed databases, a major challenge was in aggregating relevant databases into a unified one, which required expertise and enormous time. The RDF data portal can potentially eliminate the time and the cost for such a process and it is expected to advance multidisciplinary research in which data coordination is essential. Examples of such research also include personalized medicine based on the combinatorial use of genetic mutations and drug activity data; and metagenomics that needs to deal with environmental or intestinal flora information. In addition, because RDF data can be easily utilized by computer programs, the rich data in the portal is expected to contribute answering complicated questions in life sciences, when incorporated into an artificial intelligence system which is a major technological advancement in recent years.
NBDC RDF portal website: http://integbio.
DBCLS RDF guidelines website: http://wiki.
*RDF (Resource Description Framework) format: In order to utilize the vast range of information available on the Internet, technology for automated processing using computers to attain high precision is essential. The international standards organization for the Internet, the World Wide Web Consortium, has suggested that the RDF format is an international standard format that allows easy processing of data on the Internet. When stored in the RDF format, computers can process the data automatically, and researchers can utilize the data from a diverse range of fields.