Movatterモバイル変換

Ontology learning

From Wikipedia, the free encyclopedia

Automatic creation of ontologies

Machine learning anddata mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Semi-supervised learning Self-supervised learning Reinforcement learning Meta-learning Online learning Batch learning Curriculum learning Rule-based learning Neuro-symbolic AI Neuromorphic engineering Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Neural networks Autoencoder Deep learning Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Boltzmann machine Restricted GAN Diffusion model SOM Convolutional neural network U-Net LeNet AlexNet DeepDream Neural field Neural radiance field Physics-informed neural networks Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning Policy gradient SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop Mechanistic interpretability RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory Topological deep learning
Journals and conferences AAAI ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Ontology learning (ontology extraction,ontology augmentation generation,ontology generation, orontology acquisition) is the automatic or semi-automatic creation ofontologies, including extracting the correspondingdomain's terms and the relationships between theconcepts that these terms represent from acorpus of natural language text, and encoding them with anontology language for easy retrieval. Asbuilding ontologies manually is extremely labor-intensive and time-consuming, there is great motivation to automate the process.

Typically, the process starts byextracting terms and concepts ornoun phrases from plain text using linguistic processors such aspart-of-speech tagging andphrase chunking. Then statistical^[1] or symbolic^[2]^[3]techniques are used to extractrelation signatures, often based on pattern-based^[4] or definition-based^[5] hypernym extraction techniques.

Procedure

[edit]

Ontology learning (OL) is used to (semi-)automatically extract whole ontologies from natural language text.^[6]^[7] The process is usually split into the following eight tasks, which are not all necessarily applied in every ontology learning system.

Domain terminology extraction

[edit]

During the domainterminology extraction step, domain-specific terms are extracted, which are used in the following step (concept discovery) to derive concepts. Relevant terms can be determined, e.g., by calculation of theTF/IDF values or by application of the C-value / NC-value method. The resulting list of terms has to be filtered by a domain expert. In the subsequent step, similarly to coreference resolution ininformation extraction, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefore are clustering and the application of statistical similarity measures.

Concept discovery

[edit]

In the concept discovery step, terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore toconcepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the domain terminology extraction step.

Concept hierarchy derivation

[edit]

In the concept hierarchy derivation step, the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved with unsupervisedhierarchical clustering methods. Because the result of such methods is often noisy, a supervision step, e.g., user evaluation, is added. A further method for the derivation of a concept hierarchy exists in the usage of several patterns that should indicate asub- or supersumption relationship. Patterns like “X, that is a Y” or “X is a Y” indicate that X is a subclass of Y. Such pattern can be analyzed efficiently, but they often occur too infrequently to extract enough sub- or supersumption relationships. Instead, bootstrapping methods are developed, which learn these patterns automatically and therefore ensure broader coverage.

Learning of non-taxonomic relations

[edit]

In the learning of non-taxonomic relations step, relationships are extracted that do not express any sub- or supersumption. Such relationships are, e.g., works-for or located-in. There are two common approaches to solve this subtask. The first is based upon the extraction of anonymous associations, which are named appropriately in a second step. The second approach extracts verbs, which indicate a relationship between entities, represented by the surrounding words. The result of both approaches need to be evaluated by an ontologist to ensure accuracy.

Rule discovery

[edit]

Duringrule discovery,^[8] axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e.g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which, afterwards, is comprehended to a concept description. This output is then evaluated by an ontologist.

Ontology population

[edit]

At this step, the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts, methods based on the matching of lexico-syntactic patterns are used. Instances of properties are added through the application ofbootstrapping methods, which collect relation tuples.

Concept hierarchy extension

[edit]

In this step, the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be performed in a supervised manner with a trained classifier or in an unsupervised manner via the application ofsimilarity measures.

Frame and Event detection

[edit]

During frame/event detection, the OL system tries to extract complex relationships from text, e.g., who departed from where to what place and when. Approaches range from applying SVM withkernel methods tosemantic role labeling (SRL)^[9] to deepsemantic parsing techniques.^[10]

Tools

[edit]

Dog4Dag (Dresden Ontology Generator for Directed Acyclic Graphs) is an ontology generation plugin for Protégé 4.1 and OBOEdit 2.1. It allows for term generation, sibling generation, definition generation, and relationship induction. Integrated into Protégé 4.1 and OBO-Edit 2.1, DOG4DAG allows ontology extension for all common ontology formats (e.g., OWL and OBO). Limited largely to EBI and Bio Portal lookup service extensions.^[11]

Bibliography

[edit]

P. Buitelaar, P. Cimiano (Eds.).Ontology Learning and Population: Bridging the Gap between Text and Knowledge,Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2008.
P. Buitelaar, P. Cimiano, and B. Magnini (Eds.).Ontology Learning from Text: Methods, Evaluation and Applications,Series information for Frontiers in Artificial Intelligence and Applications, IOS Press, 2005.
Wong, W. (2009), "Learning Lightweight Ontologies from Text across Different Domains using the Web as Background Knowledge". Doctor of Philosophy thesis, University of Western Australia.
Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011.doi:10.1145/2166896.2166926

References

[edit]

^A. Maedche and S.Staab.Learning ontologies for the semantic web. In Semantic Web Workshop 2001.
^Roberto Navigli andPaola Velardi.Learning Domain Ontologies from Document Warehouses and Dedicated Web Sites, Computational Linguistics,30(2), MIT Press, 2004, pp.151-179.
^P.Velardi, S.Faralli, R.Navigli.OntoLearn Reloaded: A Graph-based Algorithm for Taxonomy Induction. Computational Linguistics, 39(3), MIT Press,2013, pp.665-707.
^Marti A. Hearst.Automatic acquisition of hyponyms from large text corpora. In Proceedings of the Fourteenth International Conference on Computational Linguistics, pages 539--545, Nantes, France, July 1992.
^R.Navigli, P. Velardi.Learning Word-Class Lattices for Definition and Hypernym Extraction.Proc.of the 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, July 11–16, 2010, pp.1318-1327.
^Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text",Information, Wissenschaft und Praxis, 57, p. 315 - 320,http://people.aifb.kit.edu/pci/Publications/iwp06.pdf%5B%5D (retrieved: 18.06.2012).
^Wong, W., Liu, W. & Bennamoun, M. (2012), "Ontology Learning from Text: A Look back and into the Future". ACM Computing Surveys, Volume 44, Issue 4, Pages 20:1-20:36.
^Johanna Völker;Pascal Hitzler; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources",Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685,http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).
^Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web",Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.
^Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames",Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.
^Thomas Wächter, Götz Fabian, Michael Schroeder: DOG4DAG: semi-automated ontology generation in OBO-Edit and Protégé. SWAT4LS London, 2011.doi:10.1145/2166896.2166926 http://www.biotec.tu-dresden.de/research/schroeder/dog4dag/

Natural language processing

General terms

Text analysis

Text segmentation	Compound-term processing Lemmatisation Lexical analysis Text chunking Stemming Sentence segmentation Word segmentation

Automatic summarization

Machine translation

Distributional semantics models

Language resources,
datasets and corpora

Types and standards	Corpus linguistics Lexical resource Linguistic Linked Open Data Machine-readable dictionary Parallel text PropBank Semantic network Simple Knowledge Organization System Speech corpus Text corpus Thesaurus (information retrieval) Treebank Universal Dependencies
Data	BabelNet Bank of English DBpedia FrameNet Google Ngram Viewer UBY WordNet Wikidata