TL;DR: SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts is presented and manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing it with other famous ontologies and knowledge bases.
Abstract: Constructing ontology manually is a time-consuming, error-prone, and tedious task. We present SSCO, a self-supervised learning based chinese ontology, which contains about 255 thousand concepts, 5 million entities, and 40 million facts. We explore the three largest online Chinese encyclopedias for ontology learning and describe how to transfer the structured knowledge in encyclopedias, including article titles, category labels, redirection pages, taxonomy systems, and InfoBox modules, into ontological form. In order to avoid the errors in encyclopedias and enrich the learnt ontology, we also apply some machine learning based methods. First, we proof that the self-supervised machine learning method is practicable in Chinese relation extraction (at least for synonymy and hyponymy) statistically and experimentally and train some self-supervised models (SVMs and CRFs) for synonymy extraction, concept-subconcept relation extraction, and concept-instance relation extraction; the advantages of our methods are that all training examples are automatically generated from the structural information of encyclopedias and a few general heuristic rules. Finally, we evaluate SSCO in two aspects, scale and precision; manual evaluation results show that the ontology has excellent precision, and high coverage is concluded by comparing SSCO with other famous ontologies and knowledge bases; the experiment results also indicate that the self-supervised models obviously enrich SSCO.
TL;DR: A web portal that provided a ‘filtered’ version of the Internet, with links to the most trustworthy available article on a given subject, would be a boon to scientists and science enthusiasts alike.
Abstract: SIR — Your News story “Experts plan to reclaim the web for pop science” (Nature 439, 516–517; 2006) describes a project called the Digital Universe, which aims to create, and provide links to, trustworthy peer-reviewed content on the Internet. As suggested by critics in your News story, it seems wasteful to try to reproduce content that already exists in an open, accessible and improvable form. I would encourage those working on projects such as the Digital Universe to consider a strategy that truly leverages the power of the Internet. For example, the free and editable encyclopedia Wikipedia (en.wikipedia.org) contains innumerable articles that are scientifically accurate, although it obviously contains errors, omissions and some articles of low quality. But MediaWiki, the software upon which Wikipedia is based (see www.mediawiki. org), allows one to link to specific versions of articles. Thus, expert peer reviewers could analyse articles, improve them and provide links to the trusted version of that article. A web portal that provided a ‘filtered’ version of the Internet, with links to the most trustworthy available article on a given subject, would be a boon to scientists and science enthusiasts alike. Kevin Yager Department of Chemistry, McGill University, Lab 406, Otto Maass Chemistry Building, 801 Sherbrooke Street West, Montréal, Québec H3A 2K6, Canada