Movatterモバイル変換

ホーム

1887

share via email

Tools

ISSN 0929-9971
E-ISSN: 1569-9994

GBP

HYPHEN
A flexible, hybrid method to map phenotype concept mentions to terminological resources
Author(s):Paul Thompson¹ andSophia Ananiadou¹
View AffiliationsHide Affiliations
Affiliations:
¹ University of Manchester
Source:Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication,Volume 24, Issue 1,Jan 2018,p.91 - 121
DOI:https://doi.org/10.1075/term.00015.tho
- Version of Record published :31 May 2018

Abstract

Narrative clinical records and biomedical articles constitute rich sources of information aboutphenotypes, i.e., markers distinguishing individuals with specific medical conditions from the general population. Phenotypes help clinicians to provide personalised treatments. However, locating information about them within huge document repositories is difficult, since each phenotypic concept can be mentioned in many ways. Normalisation methods automaticallymap divergent phrases to unique concepts in domain-specific terminologies, to allow location and linking of all mentions of a concept of interest. We have developed a hybrid normalisation method (HYPHEN) to handle concept mentions with wide ranging characteristics, across different text types. HYPHEN integrates various normalisation techniques that handlesurface-level variations (e.g., differences in word order, word forms or acronyms/abbreviations) andlexical-level variations (where terms have similarmeanings, but potentially unrelatedforms). HYPHEN achieves robust performance for both biomedical academic text and narrative clinical records, and has the ability to significantly outperform related methods.

Article metrics loading...

/content/journals/10.1075/term.00015.tho

2018-05-31

2025-04-18

From This Site
/content/journals/10.1075/term.00015.tho
dcterms_title,dcterms_subject,pub_keyword
-contentType:Journal -contentType:Contributor -contentType:Concept -contentType:Institution
10
5

Full text loading...

References

Alnazzawi, Noha , Paul Thompson , and Sophia Ananiadou
2016 “Mapping Phenotypic Information in Heterogeneous Textual Sources to a Domain-Specific Terminological Resource.”PLOS ONE11 (9): e0162287.10.1371/journal.pone.0162287
https://doi.org/10.1371/journal.pone.0162287 [Google Scholar]
Ananiadou, Sophia
1994 “A Methodology for Automatic Term Recognition.” InProceedings of the 15th Conference on Computational Linguistics–Volume2, 1034–1038, Kyoto, Japan. doi: 10.3115/991250.991317
https://doi.org/10.3115/991250.991317 [Google Scholar]
Aronson, Alan R. , and François-Michel Lang
2010 “An Overview of Metamap: Historical Perspective and Recent Advances.”Journal of the American Medical Informatics Association17 (3): 229–236. doi: 10.1136/jamia.2009.002733
https://doi.org/10.1136/jamia.2009.002733 [Google Scholar]
Bodenreider, O.
2004 “The Unified Medical Language System (Umls): Integrating Biomedical Terminology.”Nucleic Acids Research32: 267–270. doi: 10.1093/nar/gkh061
https://doi.org/10.1093/nar/gkh061 [Google Scholar]
Bodnari, Andreea , Louise Deleger , Thomas Lavergne , Aurelie Neveol , and Pierre Zweigenbaum
2013 “A Supervised Named-Entity Extraction System for Medical Text.” InProceedings of the hARe/CLEF Evaluation Lab, Valencia, Spain (ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-BodnariEt2013.pdf). Accessed8 February 2018.
[Google Scholar]
Carroll, John , Rob Koeling , and Shivani Puri
2012 “Lexical Acquisition for Clinical Text Mining Using Distributional Similarity.” InProceedings of the International Conference on Intelligent Text Processing and Computational Linguistics, 232–246, New Delhi, India. doi: 10.1007/978‑3‑642‑28601‑8_20
https://doi.org/10.1007/978-3-642-28601-8_20 [Google Scholar]
Cohen, William , Pradeep Ravikumar , and Stephen Fienberg
2003 “A Comparison of String Metrics for Matching Names and Records.” InProceedings of the KDD Workshop on Data Cleaning and Object Consolidation, 73–78, Washington DC, USA.
[Google Scholar]
Collier, Nigel , Anika Oellrich , and Tudor Groza
2015 “Concept Selection for Phenotypes and Diseases Using Learn to Rank.”Journal of Biomedical Semantics6 (1): 24. doi: 10.1186/s13326‑015‑0019‑z
https://doi.org/10.1186/s13326-015-0019-z [Google Scholar]
Dai, Manhong , Nigam H. Shah , Wei Xuan , Mark A. Musen , Stanley J. Watson , Brian D. Athey , and Fan Meng
2008 “An Efficient Solution for Mapping Free Text to Ontology Terms.” InProceedings of the AMIA Summit on Translational Bioinformatics, San Francisco, USA (https://knowledge.amia.org/amia-55142-tbi2008a-1.650887/t-002-1.985042/f-001-1.985043/a-041-1.985157/an-041-1.985158?qr=1). Accessed8 February 2018.
[Google Scholar]
Deléger, Louise , Fiammetta Namer , and Pierre Zweigenbaum
2007 “Defining Medical Words: Transposing Morphosemantic Analysis from French to English.” InMedinfo 2007: Proceedings of the 12th World Congress on Health (Medical) Informatics; Building Sustainable Health Systems, 535–539, Brisbane, Australia.
[Google Scholar]
Doğan, Rezarta Islamaj , Robert Leaman , and Zhiyong Lu
2014 “Ncbi Disease Corpus: A Resource for Disease Name Recognition and Concept Normalization.”Journal of Biomedical Informatics47: 1–10. doi: 10.1016/j.jbi.2013.12.006
https://doi.org/10.1016/j.jbi.2013.12.006 [Google Scholar]
Dogan, Rezarta Islamaj , and Zhiyong Lu
2012 “An Inference Method for Disease Name Normalization.” InProceedings of the AAAI Fall Symposium on Information Retrieval and Knowledge Discovery in Biomedical Text, 8–13, Arlington, USA.
[Google Scholar]
Donnelly, Kevin
2006 “Snomed-Ct: The Advanced Terminology and Coding System for Ehealth.”Studies in Health Technology and Informatics121: 279.
[Google Scholar]
Duclos, C. , A. Burgun , J. B. Lamy , P. Landais , J. M. Rodrigues , L. Soualmia , and P. Zweigenbaum
2014 “Medical Vocabulary, Terminological Resources and Information Coding in the Health Domain.” InMedical Informatics, E-Health, edited by A. Venot , A. Burgun and Quantin , 11–41. Paris, France: Springer. doi: 10.1007/978‑2‑8178‑0478‑1_2
https://doi.org/10.1007/978-2-8178-0478-1_2 [Google Scholar]
Elhadad, Noémie , Sameer Pradhan , W. W. Chapman , Suresh Manandhar , and G. K. Savova
2015 “Semeval-2015 Task 14: Analysis of Clinical Text.” InProceedings of Workshop on Semantic Evaluation. Association for Computational Linguistics, 303–310, Denver, USA.
[Google Scholar]
Fan, Jung-wei , Navdeep Sood , and Yang Huang
2013 “Disorder Concept Identification from Clinical Notes an Experience with the Share/Clef 2013 Challenge.” InProceedings of the ShARe/CLEF Evaluation Lab., Valencia, Spain (ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-FanEt2013.pdf). Accessed8 February 2018.
[Google Scholar]
Fellbaum, Christiane
(ed.)1998 WordNet: An Electronic Lexical Database. Cambridge, MA: MIT Press.
[Google Scholar]
Fu, Xiao , and Sophia Ananiadou
2014 “Improving the Extraction of Clinical Concepts from Clinical Records.” InProceedings of BioTxtM14, 47–53, Reykjavik, Iceland.
[Google Scholar]
Fu, Xiao , Riza Batista-Navarro , Rafal Rak , and Sophia Ananiadou
2015 “Supporting the Annotation of Chronic Obstructive Pulmonary Disease (Copd) Phenotypes with Text Mining Workflows.”Journal of Biomedical Semantics6 (1): 8. doi: 10.1186/s13326‑015‑0004‑6
https://doi.org/10.1186/s13326-015-0004-6 [Google Scholar]
Fu, Xiao , R. T. B. Batista-Navarro , Rafal Rak , and Sophia Ananiadou
2014 “A Strategy for Annotating Clinical Records with Phenotypic Information Relating to the Chronic Obstructive Pulmonary Disease.” InProceedings of Phenotype Day ISMB, 1–8, Boston, USA.
[Google Scholar]
Groza, Tudor , Sebastian Köhler , Dawid Moldenhauer , Nicole Vasilevsky , Gareth Baynam , Tomasz Zemojtel , Lynn Marie Schriml , Warren Alden Kibbe , Paul N. Schofield , and Tim Beck
2015 “The Human Phenotype Ontology: Semantic Unification of Common and Rare Disease.”The American Journal of Human Genetics97 (1):111–124. doi: 10.1016/j.ajhg.2015.05.020
https://doi.org/10.1016/j.ajhg.2015.05.020 [Google Scholar]
Habash, Nizar , and Bonnie Dorr
2003 “Catvar: A Database of Categorial Variations for English.” InProceedings of the MT Summit, 17–23, New Orleans, US.
[Google Scholar]
Hamosh, Ada , Alan F. Scott , Joanna S. Amberger , Carol A. Bocchini , and Victor A. McKusick
2005 “Online Mendelian Inheritance in Man (Omim), a Knowledgebase of Human Genes and Genetic Disorders.”Nucleic Acids Research33 (suppl 1):D514–D517. doi: 10.1093/nar/gki033
https://doi.org/10.1093/nar/gki033 [Google Scholar]
Han, MeiLan K. , Alvar Agusti , Peter M. Calverley , Bartolome R. Celli , Gerard Criner , Jeffrey L. Curtis , Leonardo M. Fabbri , Jonathan G. Goldin , Paul W. Jones , and William MacNee
2010 “Chronic Obstructive Pulmonary Disease Phenotypes: The Future of Copd.”American Journal of Respiratory and Critical Care Medicine182 (5): 598–604. doi: 10.1164/rccm.200912‑1843CC
https://doi.org/10.1164/rccm.200912-1843CC [Google Scholar]
Hersh, William R. , and Robert A. Greenes
1990 “Saphire – an Information Retrieval System Featuring Concept Matching, Automatic Indexing, Probabilistic Retrieval, and Hierarchical Relationships.”Computers and Biomedical Research23 (5): 410–425. doi: 10.1016/0010‑4809(90)90031‑7
https://doi.org/10.1016/0010-4809(90)90031-7 [Google Scholar]
Jaccard, Paul
1912 “The Distribution of the Flora in the Alpine Zone.”New Phytologist11 (2): 37–50. doi: 10.1111/j.1469‑8137.1912.tb05611.x
https://doi.org/10.1111/j.1469-8137.1912.tb05611.x [Google Scholar]
Jacquemin, Christian
1999 “Syntagmatic and Paradigmatic Representations of Term Variation.” InProceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, 341–348, Maryland, USA. doi: 10.3115/1034678.1034733
https://doi.org/10.3115/1034678.1034733 [Google Scholar]
Jonquet, Clement , Nigam Shah , and Mark Musen
2009 “The Open Biomedical Annotator.” InProceedings of the AMIA summit on translational bioinformatics, 56–60, San Francisco, USA.
[Google Scholar]
Kang, Ning , Rogier J. Barendse , Zubair Afzal , Bharat Singh , Martijn J. Schuemie , Erik M van Mulligen , and Jan A. Kors
2010 “Erasmus Mc Approaches to the I2b2 Challenge.” InProceedings of the 2010 i2b2/VA Workshop on Challenges in Natural Language Processing for Clinical Data, Boston, MA, USA (biosemantics.org/downloads/i2b2_challenge.pdf). Accessed15 February 2018.
[Google Scholar]
Kate, Rohit J.
2015 “Normalizing Clinical Terms Using Learned Edit Distance Patterns.”Journal of the American Medical Informatics Association23 (2): 380–386.10.1093/jamia/ocv108
https://doi.org/10.1093/jamia/ocv108 [Google Scholar]
Leaman, Robert , Rezarta Islamaj Doğan , and Zhiyong Lu
2013 “Dnorm: Disease Name Normalization with Pairwise Learning to Rank.”Bioinformatics29 (22): 2909–2917. doi: 10.1093/bioinformatics/btt474
https://doi.org/10.1093/bioinformatics/btt474 [Google Scholar]
Leaman, Robert , Ritu Khare , and Zhiyong Lu
2015 “Challenges in Clinical Natural Language Processing for Automated Disorder Normalization.”Journal of Biomedical Informatics57: 28–37. doi: 10.1016/j.jbi.2015.07.010
https://doi.org/10.1016/j.jbi.2015.07.010 [Google Scholar]
Leaman, Robert , Christopher Miller , and G. Gonzalez
2009 “Enabling Recognition of Diseases in Biomedical Text with Machine Learning: Corpus and Benchmark.” InProceedings of the 2009 Symposium on Languages in Biology and Medicine, 82–89, Jeju Island, South Korea.
[Google Scholar]
Lee, Hsin-Chun , Yi-Yu Hsu , and Hung-Yu Kao
2016 “Audis: An Automatic Crf-Enhanced Disease Normalization in Biomedical Text.”Database 2016: baw091.10.1093/database/baw091
https://doi.org/10.1093/database/baw091 [Google Scholar]
Li, Jiao , Yueping Sun , Robin J. Johnson , Daniela Sciaky , Chih-Hsuan Wei , Robert Leaman , Allan Peter Davis , Carolyn J. Mattingly , Thomas C. Wiegers , and Zhiyong Lu
2016 “Biocreative V Cdr Task Corpus: A Resource for Chemical Disease Relation Extraction.”Database 2016: baw068.
[Google Scholar]
Maglott, Donna , Jim Ostell , Kim D. Pruitt , and Tatiana Tatusova
2011 “Entrez Gene: Gene-Centered Information at Ncbi.”Nucleic Acids Research39 (suppl 1): D52–D57. doi: 10.1093/nar/gkq1237.
https://doi.org/10.1093/nar/gkq1237 [Google Scholar]
Markó, Kornél , Stefan Schulz , Olena Medelyan , and Udo Hahn
2005 “Bootstrapping Dictionaries for Cross-Language Information Retrieval.” InProceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 528–535, Salvador, Brazil.
[Google Scholar]
Miyao, Yusuke , and Jun’ichi Tsujii
2008 “Feature Forest Models for Probabilistic Hpsg Parsing.”Computational Linguistics34 (1): 35–80. doi: 10.1162/coli.2008.34.1.35
https://doi.org/10.1162/coli.2008.34.1.35 [Google Scholar]
Namer, Fiammetta , and Robert Baud
2005 “Predicting Lexical Relations between Biomedical Terms: Towards a Multilingual Morphosemantics-Based System.”Studies in Health Technology and Informatics116: 793–798.
[Google Scholar]
Névéol, A. , and P. Zweigenbaum
2016 “Clinical Natural Language Processing in 2015: Leveraging the Variety of Texts of Clinical Interest.”IMIA Yearbook: 234–239.
[Google Scholar]
Nunes, Tiago , David Campos , Sérgio Matos , and José Luís Oliveira
2013 “Becas: Biomedical Concept Recognition Services and Visualization.”Bioinformatics29 (15): 1915–1916. doi: 10.1093/bioinformatics/btt317
https://doi.org/10.1093/bioinformatics/btt317 [Google Scholar]
Oellrich, Anika , Nigel Collier , Damian Smedley , and Tudor Groza
2015 “Generation of Silver Standard Concept Annotations from Biomedical Texts with Special Relevance to Phenotypes.”PLOS ONE10 (1): e0116040. doi: 10.1371/journal.pone.0116040
https://doi.org/10.1371/journal.pone.0116040 [Google Scholar]
Okazaki, N. , S. Ananiadou , and J. Tsujii
2010 “Building a High-Quality Sense Inventory for Improved Abbreviation Disambiguation.”Bioinformatics26 (9): 1246–1253. doi: 10.1093/bioinformatics/btq129
https://doi.org/10.1093/bioinformatics/btq129 [Google Scholar]
Patrick, Jon , Yefeng Wang , and Peter Budd
2007 “An Automated System for Conversion of Clinical Notes into Snomed Clinical Terminology.” InProceedings of the Fifth Australasian Symposium on ACSW Frontiers, 219–226, Ballarat, Australia.
[Google Scholar]
Pradhan, Sameer , Noémie Elhadad , Wendy Chapman , Suresh Manandhar , and Guergana Savova
2014 “Semeval-2014 Task 7: Analysis of Clinical Text.” InProceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), 54–62, Dublin, Ireland. doi: 10.3115/v1/S14‑2007
https://doi.org/10.3115/v1/S14-2007 [Google Scholar]
Pradhan, Sameer , Noémie Elhadad , Brett R. South , David Martinez , Lee Christensen , Amy Vogel , Hanna Suominen , Wendy W. Chapman , and Guergana Savova
2015 “Evaluating the State of the Art in Disorder Recognition and Normalization of the Clinical Narrative.”Journal of the American Medical Informatics Association22 (1): 143–154.10.1136/amiajnl‑2013‑002544
https://doi.org/10.1136/amiajnl-2013-002544 [Google Scholar]
Rais, Meriem , and Natalia Grabar
2015 “Discovering the Role of Morphology on the Understanding of Biomedical Terminology by Paramedical Students.” InProccedings of the 26th Medical Informatics Europe Conference, 30–34Madrid, Spain.
[Google Scholar]
Ramanan, S. V. , Shereen Broido , and P Senthil Nathan
2013 “Performance of a Multi-Class Biomedical Tagger on Clinical Records.” InProceedings of the ShARe/CLEF Evaluation Lab., Valencia, Spain (ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-RamananEt2013.pdf). Accessed8 February 2018.
[Google Scholar]
Ruch, Patrick , Julien Gobeill , Christian Lovis , and Antoine Geissbühler
2008 “Automatic Medical Encoding with Snomed Categories.”BMC Medical Informatics and Decision Making8 (1): S6. doi: 10.1186/1472‑6947‑8‑S1‑S6
https://doi.org/10.1186/1472-6947-8-S1-S6 [Google Scholar]
Savova, Guergana K. , James J. Masanz , Philip V. Ogren , Jiaping Zheng , Sunghwan Sohn , Karin C. Kipper-Schuler , and Christopher G. Chute
2010 “Mayo Clinical Text Analysis and Knowledge Extraction System (Ctakes): Architecture, Component Evaluation and Applications.”Journal of the American Medical Informatics Association17 (5): 507–513. doi: 10.1136/jamia.2009.001560
https://doi.org/10.1136/jamia.2009.001560 [Google Scholar]
Schriml, Lynn Marie , Cesar Arze , Suvarna Nadendla , Yu-Wei Wayne Chang , Mark Mazaitis , Victor Felix , Gang Feng , and Warren Alden Kibbe
2012 “Disease Ontology: A Backbone for Disease Semantic Integration.”Nucleic Acids Research40 (D1): D940–D946. doi: 10.1093/nar/gkr972
https://doi.org/10.1093/nar/gkr972 [Google Scholar]
Suominen, Hanna , Sanna Salanterä , Sumithra Velupillai , Wendy W. Chapman , Guergana Savova , Noemie Elhadad , Sameer Pradhan , Brett R. South , Danielle L. Mowery , and Gareth J. F. Jones
2013 “Overview of the Share/Clef Ehealth Evaluation Lab 2013.” InProceedings of the International Conference of the Cross-Language Evaluation Forum for European Languages, 212–231, Valencia, Spain.
[Google Scholar]
Tanenblatt, Michael A. , Anni Coden , and Igor L. Sominsky
2010 “The Conceptmapper Approach to Named Entity Recognition.” InProceedings of LREC, 546–551Valletta, Malta.
[Google Scholar]
Thompson, Paul , John McNaught , Simonetta Montemagni , Nicoletta Calzolari , Riccardo Del Gratta , Vivian Lee , Simone Marchi , Monica Monachini , Piotr Pezik , and Valeria Quochi
2011 “The Biolexicon: A Large-Scale Terminological Resource for Biomedical Text Mining.”BMC Bioinformatics12 (1): 397. doi: 10.1186/1471‑2105‑12‑397
https://doi.org/10.1186/1471-2105-12-397 [Google Scholar]
Uzuner, Özlem , Brett R. South , Shuying Shen , and Scott L. DuVall
2011 “2010 I2b2/Va Challenge on Concepts, Assertions, and Relations in Clinical Text.”Journal of the American Medical Informatics Association18 (5): 552–556. doi: 10.1136/amiajnl‑2011‑000203
https://doi.org/10.1136/amiajnl-2011-000203 [Google Scholar]
Wang, Chunye , and Ramakrishna Akella
2013 “Ucsc’s System for Clef Ehealth 2013 Task 1.” InProceedings of the ShARe/CLEF Evaluation Lab., Valencia, Spain (ceur-ws.org/Vol-1179/CLEF2013wn-CLEFeHealth-WangEt2013.pdf). Accessed8 February 2018.
[Google Scholar]
Wang, Liqin , Bruce E. Bray , Jianlin Shi , Guilherme Del Fiol , and Peter J. Haug
2016 “A Method for the Development of Disease-Specific Reference Standards Vocabularies from Textual Biomedical Literature Resources.”Artificial Intelligence in Medicine68: 47–57.10.1016/j.artmed.2016.02.003
https://doi.org/10.1016/j.artmed.2016.02.003 [Google Scholar]
Wulff, Henrik R.
2004 “The Language of Medicine.”Journal of the Royal Society of Medicine97 (4): 187–188. doi: 10.1258/jrsm.97.4.187
https://doi.org/10.1258/jrsm.97.4.187 [Google Scholar]
Zhou, Xiaohua , Xiaodan Zhang , and Xiaohua Hu
2006 “Maxmatcher: Biological Concept Extraction Using Approximate Dictionary Lookup.” InProceedings of PRICAI 2006: Trends in Artificial Intelligence, 1145–1149, Guilin, China. doi: 10.1007/978‑3‑540‑36668‑3_150
https://doi.org/10.1007/978-3-540-36668-3_150 [Google Scholar]

/content/journals/10.1075/term.00015.tho

HYPHEN

Terminology. International Journal of Theoretical and Applied Issues in Specialized Communication24, 91 (2018);https://doi.org/10.1075/term.00015.tho

/content/journals/10.1075/term.00015.tho

Data & Media loading...

Movatterモバイル変換

Login

Share

Tools

HYPHEN

A flexible, hybrid method to map phenotype concept mentions to terminological resources

Abstract

From This Site

Most Read This Month

Most Cited

Methods of automatic term recognition: A review

Term extraction using non-technical corpora as a point of leverage

Theories of terminology: Their description, prescription and explanation

Causes of denominative variation in terminology: A typology proposal

Process-oriented terminology management in the domain of Coastal Engineering

A corpus comparison approach for terminology extraction

Automatic term recognition based on statistics of compound nouns and their components

Automatic term recognition based on statistics of compound nouns

TExSIS: Bilingual terminology extraction from parallel corpora using chunk-based alignment

Variation in the organization of medical terms: Exploring some motivations for term choice

Access Key

INFORMATION

FORTHCOMING

e-Newsletter