- Pierre Holat17,
- Nadi Tomeh17,
- Thierry Charnois17,
- Delphine Battistelli18,
- Marie-Christine Jaulent19 &
- …
- Jean-Philippe Métivier20
Part of the book series:Lecture Notes in Computer Science ((LNISA,volume 9897))
Included in the following conference series:
1780Accesses
Abstract
In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore, existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 5719
- Price includes VAT (Japan)
- Softcover Book
- JPY 7149
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Using the script provided byhttp://www.cnts.ua.ac.be/conll2000/chunking/output.html which take the same input data format (BIO) as our data.
- 5.
- 6.
- 7.
- 8.
References
Agrawal, R., Srikant, R.: Mining sequential patterns. In: Proceedings of the Eleventh International Conference on Data Engineering, pp. 3–14 (1995)
Béchet, N., Cellier, P., Charnois, T., Crémilleux, B.: Sequence mining under multiple constraints. In: Proceedings of the 30th Annual ACM Symposium on Applied Computing, pp. 908–914 (2015)
Cohen, K.B.: BioNLP: biomedical text mining. In: Handbook of Natural Language Processing, 2nd edn. (2010)
Doğan, R.I., Leaman, R., Lu, Z.: NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inf.47, 1–10 (2014)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370 (2005)
Harris, Z.S.: Distributional structure. Word10(2–3), 146–162 (1954)
Kokkinakis, D.: Developing resources for swedish bio-medical text mining. In: Proceedings of the 2nd International Symposium on Semantic Mining in Biomedicine (SMBM) (2006)
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data (2001)
Leaman, R., Miller, C., Gonzalez, G.: Enabling recognition of diseases in biomedical text with machine learning: corpus and benchmark. In: Proceedings of the 2009 Symposium on Languages in Biology and Medicine, vol. 82(9) (2009)
Martin, L., Battistelli, D., Charnois, T.: Symptom extraction issue. In: Proceedings of BioNLP 2014, pp. 107–111 (2014)
Métivier, J.P., Serrano, L., Charnois, T., Cuissart, B., Widlöcher, A.: Automatic symptom extraction from texts to enhance knowledge discovery on rare diseases. In: Holmes, J.H., Bellazzi, R., Sacchi, L., Peek, N. (eds.) Artificial Intelligence in Medicine. LNCS, vol. 9105, pp. 249–254. Springer, Heidelberg (2015). doi:10.1007/978-3-319-19551-3_33
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprintarXiv:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Pei, J., Han, J., Wang, W.: Constraint-based sequential pattern mining: the pattern-growth methods. J. Intell. Inf. Syst.28(2), 133–160 (2007)
Savova, G.K., Masanz, J.J., Ogren, P.V., Zheng, J., Sohn, S., Kipper-Schuler, K.C., Chute, C.G.: Mayo clinical text analysis and knowledge extraction system (ctakes): architecture, component evaluation and applications. J. Am. Med. Inf. Assoc.17(5), 507–513 (2010)
South, B.R., Shen, S., Jones, M., Garvin, J., Samore, M.H., Chapman, W.W., Gundlapalli, A.V.: Developing a manually annotated clinical document corpus to identify phenotypic information for inflammatory bowel disease. BMC Bioinform.10(9), 1 (2009)
Srikant, R., Agrawal, R.: Mining sequential patterns: generalizations and performance improvements. In: Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology, pp. 3–17 (1996)
Uzuner, Ö., South, B.R., Shen, S., DuVall, S.L.: 2010 i2b2/va challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inf. Assoc.18(5), 552–556 (2011)
Wagholikar, K.B., Torii, M., Jonnalagadda, S.R., Liu, H.: Pooling annotated corpora for clinical concept extraction. J. Biomed. Semant.4(1), 1–10 (2013)
Acknowledgments
This work is supported by the French National Research Agency (ANR) as part of the project Hybride ANR-11-BS02-002 and the “Investissements d’Avenir” program (reference: ANR-10-LABX-0083).
Author information
Authors and Affiliations
LIPN, University of Paris 13, Sorbonne Paris Cité, Paris, France
Pierre Holat, Nadi Tomeh & Thierry Charnois
MoDyCo, University of Paris Ouest Nanterre La Défense, Paris, France
Delphine Battistelli
Inserm, Paris, France
Marie-Christine Jaulent
GREYC, University of Caen Basse-Normandie, Caen, France
Jean-Philippe Métivier
- Pierre Holat
You can also search for this author inPubMed Google Scholar
- Nadi Tomeh
You can also search for this author inPubMed Google Scholar
- Thierry Charnois
You can also search for this author inPubMed Google Scholar
- Delphine Battistelli
You can also search for this author inPubMed Google Scholar
- Marie-Christine Jaulent
You can also search for this author inPubMed Google Scholar
- Jean-Philippe Métivier
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toPierre Holat.
Editor information
Editors and Affiliations
Stockholm University , Stockholm, Sweden
Henrik Boström
Leiden University , Leiden, The Netherlands
Arno Knobbe
University of Porto , Porto, Portugal
Carlos Soares
Stockholm University , Stockholm, Sweden
Panagiotis Papapetrou
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Holat, P., Tomeh, N., Charnois, T., Battistelli, D., Jaulent, MC., Métivier, JP. (2016). Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text. In: Boström, H., Knobbe, A., Soares, C., Papapetrou, P. (eds) Advances in Intelligent Data Analysis XV. IDA 2016. Lecture Notes in Computer Science(), vol 9897. Springer, Cham. https://doi.org/10.1007/978-3-319-46349-0_17
Download citation
Published:
Publisher Name:Springer, Cham
Print ISBN:978-3-319-46348-3
Online ISBN:978-3-319-46349-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative