Font Size:
Wikipedia-based Extraction of Lightweight Ontologies for Concept Level Annotation
Last modified: 2014-09-27
Abstract
This poster describes a project under development in which we propose a framework for automating the development of lightweight ontologies for semantic annotations. When considering building ontologies for annotations in any domain, we follow the process of ontology learning in Stelios 2006, but since we are looking for lightweight ontology, we only consider a subset of these tasks, which are the acquisition of domain terminologies, generating concept hierarchies, learning relations and properties, and ontology evaluation.
When developing the framework modules we rely in most of our knowledge base on the structure of the Wikipedia, which is the category and the link structure of the Wikipedia pages in addition to specific sections of the content. To ensure machine understandability and interoperability, ontologies have to be explicit to make an annotation publicly accessible, formal to make an annotation publicly agreeable, and unambiguous to make an annotation publicly identifiable. An important aspect of building the domain ontology is to define an annotation schema that allows the developed ontologies to be reused and be part of linked data, we designed our schema based on annotation elements already defined in the Dublin Core standards and we also used the dbpedia schema for defining annotation elements for named entities. We developed additional annotation elements that will define domain concepts and context, also relations between concepts. These annotation elements are based on the link structure of the Wikipedia, their definition in the Wikipedia page and the category structure of the concepts. The framework modules include; domain concept extraction, semantic relatedness measures, concepts clustering and Wikipedia based relation extraction.
When developing the framework modules we rely in most of our knowledge base on the structure of the Wikipedia, which is the category and the link structure of the Wikipedia pages in addition to specific sections of the content. To ensure machine understandability and interoperability, ontologies have to be explicit to make an annotation publicly accessible, formal to make an annotation publicly agreeable, and unambiguous to make an annotation publicly identifiable. An important aspect of building the domain ontology is to define an annotation schema that allows the developed ontologies to be reused and be part of linked data, we designed our schema based on annotation elements already defined in the Dublin Core standards and we also used the dbpedia schema for defining annotation elements for named entities. We developed additional annotation elements that will define domain concepts and context, also relations between concepts. These annotation elements are based on the link structure of the Wikipedia, their definition in the Wikipedia page and the category structure of the concepts. The framework modules include; domain concept extraction, semantic relatedness measures, concepts clustering and Wikipedia based relation extraction.
Full Text:
Abstract (PDF)
| Image (PDF)