Efficient Extraction of Ontologies from Domain Specific Text Corpora

ID
TR-2012-04
Authors
Tianyu Li, Pirooz Chubak, Laks V.S. Lakshmanan and Rachel Pottinger
Publishing date
July 26, 2012
Length
12 pages
Abstract
Extracting ontological relationships (e.g., isa and hasa) from free-text repositories (e.g., engineering documents and in- struction manuals) can improve users’ queries, as well as benefit applications built for these domains. Current methods to extract ontologies from text usually miss many meaningful relationships because they either con- centrate on single-word terms and short phrases or neglect syntactic relationships between concepts in sentences. We propose a novel pattern-based algorithm to find onto- logical relationships between complex concepts by exploit- ing parsing information to extract multi-word concepts and nested concepts. Our procedure is iterative: we tailor the constrained sequential pattern mining framework to discover new patterns. Our experiments on three real data sets show that our algorithm consistently and significantly outperforms previous representative ontology extraction algorithms.