
Adapting Existing Natural Language Processing Resources forCardiovascular Risk Factors Identification in Clinical Notes
Abdulrahman Khalifa
Stéphane Meystre
Issue date 2015 Dec.
Abstract
The 2014 i2b2 natural language processing shared task focused onidentifying cardiovascular risk factors such as high blood pressure, highcholesterol levels, obesity and smoking status among other factors found inhealth records of diabetic patients. In addition, the task involved detectingmedications, and time information associated with the extracted data. This paperpresents the development and evaluation of a natural language processing (NLP)application conceived for this i2b2 shared task. For increased efficiency, theapplication main components were adapted from two existing NLP tools implementedin the Apache UIMA framework: Textractor (for dictionary-based lookup) andcTAKES (for preprocessing and smoking status detection). The applicationachieved a final (micro-averaged) F1-measure of 87.5% on thefinal evaluation test set. Our attempt was mostly based on existing toolsadapted with minimal changes and allowed for satisfying performance with limiteddevelopment efforts.
Graphical abstract
1. Introduction
The 2014 i2b2 (Informatics for Integrating Biology and the Bedside) challengeproposed several different tasks: clinical text de-identification, cardiovascularrisk factors identification, software usability assessment, and novel data uses. Ourefforts focused on the second track, identifying risk factors for heart diseasebased on the automated analysis of narrative clinical records of diabetic patients[1]. The annotationguidelines for the task defined eight categories of information associated withincreased risk for heart disease: 1) Diabetes, 2) Coronary Artery Disease (CAD), 3)Hyperlipidemia, 4) Hypertension, 5) Obesity, 6) Family history of CAD, 7) Smokingand 8) Medications associated with the aforementioned chronic diseases. Eachcategory of information (except family history of CAD and smoking status) had to bedescribed withindicator andtime attributes. Theindicator attribute captures indications of the risk factor in the clinical text.For instance, Diabetes could be identified using a mention of the disease (i.e.“patient has h/o DMII”), or a hemoglobin A1c value above 6.5 mg/dL(i.e. “7/18: A1c: 7.3”) while CAD could be identified using amention (i.e. “PMH: significant for CAD”), or an event (i.e.“CABG in 1999”). The time attribute specifies the temporal relationto the Document Creation Time (DCT). It could take any one of the following values:before DCT, during DCT or after DCT. We refer the reader to [2] for a complete description of the annotationguidelines. For this challenge, we built a natural language processing (NLP)application based on the Apache UIMA (Unstructured Information ManagementArchitecture) [3] and reusingexisting tools previously developed to address similar tasks in previous i2b2challenges. In this paper, we present our approach to extract relevant informationfrom clinical notes, discuss performance results, and conclude with remarks aboutour experience adapting existing NLP tools.
2. Background
Extracting information from clinical notes has been the focus of a growingbody of research these past years [4]. Common characteristics of narrative text used by physicians inelectronic health records (e.g., telegraphic style, ambiguous abbreviations) make itdifficult to access such information automatically. Natural Language Processing(NLP) techniques are needed to convert information from the unstructured text to astructured form readily processable by computers [5,6]. Thisstructured information can then be used to extract meaning and enable ClinicalDecision Support (CDS) systems that assist healthcare professionals and improvehealth outcomes [7]. Among theearliest attempts to develop NLP applications in the medical domain, the LSP(Linguistic String Project) [8], and MedLEE (Medical Language Extraction and Encoding system)[9] were prominentexamples. More recent applications include MetaMap [10] developed by the National Library of Medicineto map terms in biomedical text with concepts in the UMLS (Unified Medical LanguageSystem) Metathesaurus [11].cTAKES [12] was developed atthe Mayo Clinic and is described as “large-scale, comprehensive, modular,extensible, robust, open-source” application based on Apache UIMA. It can beused to preprocess clinical text, find named entities and perform additionaladvanced NLP tasks such as coreference resolution. Textractor [13] is another UIMA-based applicationthat was originally developed at the University of Utah to extract medications,their attributes, and reasons for their prescription from clinical notes.
When extracting information from clinical notes, NLP applications must takelocal contextual and temporal information into account for improved accuracy.Contextual information is important to determine if concepts are affirmed or negated(e.g., ‘denies any chest pain’), or if the subject of theinformation is the patient or someone else (e.g., ‘mother hasdiabetes’). Popular algorithms for negation detection in clinical notesinclude NegExpander [14] andNegEx [15]. Temporalinformation is critical to establish chronological order of events described inpatient notes and to resolve mentions of procedures or laboratory results tospecific time points for accurate analysis [16,17]. The ConTextalgorithm [18] proposed byChapman et. al. is an extension of NegEx that allows analysis of contextualinformation like negation (negated, affirmed), temporality (historical, recent,hypothetical), and experiencer (patient, other). The development of NLP applicationstypically requires significant efforts and relies on annotated clinical text fortraining and testing. Widely accessible and shared annotated corpora in the medicaldomain are still rare, mainly because of strict patient privacy rules. This scarcityhas been an obstacle to developing state-of-the-art NLP approaches for clinical text[19]. To address thisobstacle and enable direct comparison of NLP approaches in the clinical domain, i2b2shared NLP tasks have been organized almost annually since 2006. The challengesstarted with an automated de-identification [20] and smoking status detection [21] challenges. In 2008, the i2b2 challengefocused on identifying information about obesity and 15 co-morbidities[22]. In 2009, the thirdi2b2 challenge [23] wasfocused on identifying medications and associated information such as dosage andfrequency. This was followed by challenges for medical concept extraction, assertionand relations classification in 2010 [24], followed by coreference resolution tasks in 2011[25] and a temporalrelations classification in 2012 [26].
To reduce development efforts, many authors have reused NLP tools orresources such as ConText, sentence boundary detectors and part-of-speech taggersfrom OpenNLP project [27],the Stanford parser [28], orthe Weka machine learning framework [29], but the majority of their applications were still newdevelopments. Reusing larger components or even existing NLP applications couldallow for further development effort reduction. A good example was the applicationdeveloped by Wellner et al. [30] for the 2006 i2b2 de-identification task. It was based on theadaptation of two applications originally designed for recognizing named entities innewswire text. The process involved running two applications out-of-the-box as abaseline and then gradually introducing a few task-specific features, using biasparameters to control feature weights, and adding lists of common English wordsduring development to improve performance. With minimal effort, they were able toobtain very high performance for the task. Although their attempt used applicationsout-of-the-box as baselines, they had to re-train the models with new task-specificfeatures to achieve high performance. Our attempt focused on adapting existing toolsthat were developed to solve similar tasks in the past, and do it without featureengineering and re-training of machine learning models.
3. Methods
3.1. Datasets
The i2b2 NLP shared task organizers distributed two annotated datasets(SET1 and SET2) to be used for development and training. These sets werereleased separately, with a few weeks interval. SET1 was composed of 521de-identified clinical notes and SET2 was composed of 269 de-identified notes;therefore, a total of 790 documents were available for training. The test setwas released three days before final submission and consisted of a total of 514de-identified clinical notes.
3.2. NLP Application Overview
As already mentioned, our application was based on the Apache UIMAframework, with components adapted from two existing applications. Because ofthe various nature of information to be extracted in this task, we experimentedwith different approaches for different categories of information. For example,Textractor’s dictionary-based lookup component was used to detectmentions of chronic diseases, in addition to mentions of CAD events as definedin the annotation guidelines. The results of the lookup module were thenfiltered using lists of UMLS Metathesaurus concept identifiers CUIs for diseaseand risk factor concepts defined for the task. Smoking status was identifiedusing the existing classifier available from cTAKES. Medications and the varioustest results (hemoglobin A1c, glucose, blood pressure, cholesterol, etc) wereidentified using pattern matching with regular expressions. Family history ofCAD was detected by modifying the contextual analysis of the detected CADmentions using ConText’s ‘experiencer’ analysis.
The application pipeline is depicted inFigure 1 and described below. The analysis of clinical text beginswith a preprocessing stage that consists in segmenting the text into sections,splitting it into sentences, tokenizing and assigning part-of-speech tags to theinput text with cTAKES. This is followed by running the smoking statusclassifier from cTAKES “out-of-box” to classify each patientrecord to a smoking status category: CURRENT, PAST, EVER, NEVER, UNKNOWN. Theexisting cTAKES SMOKER label was changed to EVER, as defined for this i2b2task.
Figure 1.
Overview of NLP application pipeline with adapted components from cTAKE andTextractor
The text analysis then continues with rule-based pattern matchingmodules for detecting medications and laboratory test results. Medications weredetected with a manually curated terminology of synonymous terms andabbreviations linked to each medications category. These lists were compiledusing UMLS Metathesaurus terminologies and lists of common abbreviations foundin clinical narratives (manually built by local domain experts); and thenmanually grouping the concepts into medication categories. The number of termsused for each medications varied widely, ranging from as few as 3 (e.g. formetformin) to more than 50 (e.g. for beta blockers and aspirin). Laboratory testresults and vital signs were detected using regular expressions and theassociated values were compared with abnormality thresholds defined in theguidelines. For instance, the phrase “Cholesterol-LDL 08/26/2091148” indicates an LDL cholesterol concentration of 148 mg/dL, which isabove the normal concentration of 100 mg/dL and should therefore be included asa risk factor. Special attention was paid to avoid incorrect values that werepart of other numeric expressions (e.g., dates) by restricting regularexpression matches to reasonable value ranges and imposing specific conditionson number boundaries (see examples inTable1). Two regular expressions were used for each relevant laboratorytest or vital sign indicator; one for capturing the term and the other fornumerical value associated with the laboratory test or vital sign.
Table 1.
Examples of regular expressions used for matching test mentions and values.
Laboratory/Test | Regular expression for mention | Regular expression for value |
---|---|---|
Glucose (for Diabetes mellitus) | (fasting)?(blood)?(glucose|\bGLU(−poc)?\b|\bBG\b|(blood)sugar(s)?|\bFS\b|\bBS\b|fingerstick|\bFG\b) | (?<!/|\d)(\d\d\d?)(−\d\d\d)?(?!/|\d|\w) |
Blood Pressure (for Hypertension) | (?<!\w)((s)?BP[s]?|b/p|((blood|systolic)[ ]+pressure[s]?)|hypertensive)[:]?(?!\w) | (?<!/|\d)(\d\d\d)/(\d\d\d?)(?!/\d|\d) |
The application then proceeded with the UMLS Metathesaurus lookup modulefrom Textractor. This module uses Apache Lucene-based [31] dictionary indexes to detect disease andrisk factor terms. Before the dictionary lookup, acronyms were expanded andtokens normalized by removing unwanted stopwords. The lookup module then matchedterms that belonged to one of the predefined UMLS semantic types for diseases(i.e., T019, T033, T046, T047 and T061). Matching was performed at the tokenlevel first, and then expanded to match at the noun phrase chunk level. Alldetected concepts were then filtered based on their CUIs to only includeconcepts belonging to one of the five disease and risk factor categoriesidentified in the guidelines: CAD, Diabetes mellitus, Obesity, Hyperlipidemia,and Hypertension.
Finally, the application performed contextual analysis of all extractedand filtered information to exclude negated concepts, verify that the patientwas the experiencer, and produce time attributes for each concept in relation tothe DCT. Negation and experiencer analysis was performed using a localimplementation of the ConText algorithm, as available in Textractor. Detectionof family history of CAD was handled by considering all extracted CAD conceptswith an experiencer other than the patient (e.g., “mother has history ofCAD”) as apresent family history of CAD. If all CADconcepts were identified as belonging to the patient, or if no CAD concepts werefound in the clinical note, then family history of CAD was set tonotpresent.
We experimented with various uses of ConText’s temporal analysis(i.e., concepts classified as recent, historical or hypothetical) in order tomap them to the corresponding time attribute values (i.e., before DCT, duringDCT or after DCT). However, initial results on the training data using thisapproach were not satisfying. As an alternative approach, we used the mostcommon time value found for each category of information in the training data.For example, chronic diseases such as CAD and most medications werecontinuing (i.e., existed before, during, and after thehospital stay or visit) and therefore annotated with all three time attributevalues in the reference standard. As another example, laboratory test resultsvaried with examples like hemoglobin A1c and glucose tests that were mostly‘before DCT’, and others like hypertension that were mostly‘during DCT’.
4. Results
After development and refinement based on the training corpus (SET1 andSET2), the NLP application processed the testing corpus when made available, and theapplication output was sent to the shared task organizers for analysis. Theapplication output was compared with the reference standard using the evaluationscript provided by the shared task organizers and all extracted informationclassified as true positive (i.e., output matches with the reference standard),false positive, or false negative. Metrics used included recall, precision, and theF1-measure (details in [1]). The results for each class of information are presented inTable 2. For overall averages, bothmacro- and micro-averages are included. Each separate class-indicator combination isreported using micro-averages only. The evaluation script contained an option tocalculate results separately for each class of information using the –filteroption. It also allowed computing specific class and indicator attribute values suchas the class DIABETES and indicator attribute value ofmentionusing the option –conjunctive. Results for each disease category arepresented formention and each disease-specific indicatorsseparately as in the annotation guideline. The SMOKING category results arepresented asstatus only, and MEDICATION results are aggregated forall the categories correctly identified in the clinical records. All results in thetable were computed for all three values of time attribute for each class and noattempt made to separate ‘before DCT’, ‘during DCT’and ‘after DCT’ results for each class.
Table 2.
Macro- and micro-averaged overall results including the micro-averaged breakdownof final results for every class of information given in terms of Precision,Recall and F1-measure.
Indicator | Precision | Recall | F1-measure | |
---|---|---|---|---|
CAD | mention | 0.883 | 0.9651 | 0.9222 |
symptom | 0.2095 | 0.4429 | 0.2844 | |
event | 0.6457 | 0.5899 | 0.6165 | |
test | 0.4557 | 0.6102 | 0.5217 | |
DIABETES | mention | 0.9512 | 0.9887 | 0.9696 |
A1C | 0.8611 | 0.7561 | 0.8052 | |
glucose | 0.1486 | 0.3333 | 0.2056 | |
HYPERLIPIDEMIA | mention | 0.9899 | 0.827 | 0.9011 |
high cholesterol | 0.5714 | 0.3636 | 0.4444 | |
high LDL | 0.84 | 0.7241 | 0.7778 | |
HYPERTENSION | mention | 0.9918 | 0.9891 | 0.9904 |
high BP | 0.8571 | 0.5231 | 0.6497 | |
OBESITY | mention | 0.7562 | 1.0 | 0.8612 |
BMI | 0.9231 | 0.7059 | 0.8 | |
SMOKING | 0.8638 | 0.8672 | 0.8655 | |
MEDICATION | 0.8282 | 0.8911 | 0.8585 | |
FAM. HIST. of CAD | 0.9494 | 0.9494 | 0.9494 | |
Macro-average | 0.8494 | 0.8914 | 0.8699 | |
Micro-average | 0.8552 | 0.8951 | 0.8747 |
As shown inTable 2, the applicationachieved an overall micro-averaged F1-measure of 87.47% and amacro-averaged F1-measure of 86.99%. In most disease categories,accuracy was highest for mentions of disease with micro-averaged F1-measures of92.22%, 94.94%, 96.96%, 90.11%, and 99.04%for CAD, family history of CAD, Diabetes, Hyperlipidemia, and Hypertension,respectively. Medications, mentions of Obesity and Smoking status identificationaccuracy reached micro-averaged F1-measures of 85.85%,86.12% and 86.55%, respectively. Accuracy was lower with otherinformation categories such as laboratory tests, CAD events and symptoms withF1-measures ranging from 20.56% to %80.
5. Discussion
As presented above, the application accuracy for mentions of the variousdiseases, smoking status, medications and family history was higher than accuracyfor any other indicator type defined in the annotation guidelines (e.g., laboratorytests, CAD events and symptoms). The dictionary lookup approach with terminologicalcontent from the UMLS Metathesaurus for detecting disease mentions was successfulfor this task. Similarly, the smoking status classifier from cTAKES successfullyidentified and classified smoking status information (F1-measure of about87%) despite the fact that the model was used out-of-the-box, without anytraining on the new corpus for the current i2b2 NLP task. The identification ofmedications and their attributes reached an F1-measure of about86% when using regular expressions and manually curated lists of terms,demonstrating the feasibility of this approach for the type of narrative notes usedin this shared task. The precision obtained for medications was lower (83%)than recall (89%) and hence affected the final F1-measure. Thisis mainly due to the way we chose to generate the time attribute by using thecontinuing times scenario (i.e., generating ‘before DCT’,‘during DCT’ and ‘after DCT’ temporal informationtags for every medication detected in the notes). Obviously, there will be falsepositives associated with this approach when medications strictly occur for eitherone or two of the time values in the clinical notes. In addition, since themedication term lists were created manually, some spelling variations and termscould have been missed, therefore producing some false negatives and affectingoverall recall. An example of spelling variation is the term‘nitroglycerine’ in thenitrate group category,which appeared in both corpora as ‘nitroglycerin’. The latter wasnot in the nitrate list used by our application and hence caused some falsenegatives. An example of completely missed terms was sublingual nitroglycerinmentioned as ‘SL NTG’. Among disease mentions, the Hyperlipidemiaclass had the lowest recall (83%) and Obesity had the lowest precision(76%). The former was mostly due to some clinical reports containingannotations for Hyperlipidemia mentions appearing as ‘elevated serumcholesterol’, ‘elevated lipids’ and ‘highcholesterol’ cholesterol? that were missed by our application because ofinaccurate chunking. In addition, we did not have the corresponding CUI codes forsome of them in our dictionary lookup module. There were at least two cases in thetesting corpus where Hyperlipidemia was mentioned directly following a word with nospace in between such as ‘hemodialysis Hyperlipidemia’ which ourapplication missed also. The low precision with Obesity was caused by including theUMLS concept ‘overweight’ in our list of CUIs for Obesity. Although‘overweight’ was used as indicator for obesity in one record in thereference standard corpora, its use produced many false positives since‘overweight’ often does not indicate obesity. There were also falsepositive mentions of Obesity produced by our application in cases where‘obese’ was mentioned without indicating Obesity (e.g.,“abdomen is slightly obese” and “Abdomen: Moderatelyobese”). The other indicators for diseases and risk factors were quitechallenging and our approach using regular expressions at the lexical level was notalways effective. With the exception of hemoglobin A1c laboratory tests (forDiabetes), BMI (for Obesity), and cholesterol LDL (for Hyperlipidemia), theapplication performance was modest with an F1-measure ranging from21% for the blood glucose indicator up to 65% for the blood pressureindicator. Some of the challenges with these indicators are summarized below,
Lexical and spelling variations: Some laboratoryindicators for diseases are mentioned with many lexical variations andacronyms.Table 1 shows theregular expressions used to capture blood glucose for diabetes and bloodpressure for hypertension. As shown, glucose can be described with avariety of terms like BG, BS, FS and FG; and blood pressure can bedescribed with terms like BP and b/p. This is an example of some of thelimitations with our approach. and a comprehensive strategy to deal withthis issue to enable better accuracy would be needed.
Extracting laboratory numerical results accurately:When the application finds matching terms for laboratory or testindicators, it must proceed with extracting associated numerical valuesand compare them to threshold levels for abnormality. Extractingnumerical values may be straightforward when they immediately follow theterm and are expressed as single units such as in the phrase“FSBG was 353”. However, other phrases can be morechallenging like “FG 120–199; now 68–172,although 172 = outlier, mostly in the 70–130”.In this case, ranges of values are expressed with‘–’, and multiple units are expressed withtemporal and frequency modifiers (i.e. ‘now’ and‘mostly’).
Training data sparseness: The number of trainingexamples available was sometimes too low to allow for the variety neededfor adequate application generalization. For instance, in the case ofcholesterol indicator for Hyperlipidemia, the total number of availableannotations was only 9 in the whole set of 790 training documents. Incontrast, there were about 33 annotations available for the LDLindicator.
Complex time analysis. Test and laboratoryindicators require more sophisticated time attribute analysis and thisis another limitation of our approach. Unlike chronic disease mentionannotations which were mostly characterized with‘continuing’ time attribute (i.e. before, during andafter DCT), most of the laboratory and vital sign annotations werecharacterized by a variety of time attribute values. For instance,hemoglobin A1c and glucose tests were usually conducted in a prior visitand hence mostly annotated with ’before DCT’ while bloodpressure (BP) was mostly measured during the patient visit and hence hadmostly ’during DCT’ time value. To examine the impact oftime attributes on performance of our application, we followed the“fixed” evaluation procedure described in [32] and produced resultsfor some indicators after replacing the value of time attribute with‘before DCT’ in all annotations from our applicationoutput and in the testing reference standard (seeTable 3). This evaluation considers truepositives, false positives and false negatives for each individualannotation while ignoring the time attribute (i.e. application output isnot penalized for incorrect time values). As shown intable 3, the performance of our applicationimproved when the time component was ignored in the evaluation (comparewith results fromTable 2). Ourdecision to use the most common time attribute values for each of theseindicators caused a loss in precision and recall contributing to loweroverall F1-measure score.
Table 3.
Results for Medications and some disease indicators after fixing the timeattribute to the same value in both application output and testing referencestandard.
Precision | Recall | F1-measure | |
---|---|---|---|
Glucose | 0.2568 | 0.6129 | 0.3619 |
High Cholesterol | 0.7143 | 0.5 | 0.5882 |
High BP | 0.8908 | 0.5792 | 0.702 |
MEDICATIONS | 0.8791 | 0.8826 | 0.8808 |
6. Conclusion
Our rapid approach, adapting resources from existing applications for the2014 i2b2 challenge, allowed for performance similar to other more sophisticatedapplication developed for this task which used additional manual annotations ormultiple machine learning classifiers [1]. We think that existing NLP resources should be reused, andmost can be adapted and used at least as baseline for future tasks in the clinicaldomain. Improvements for future attempts shall focus on a comprehensive strategy totackle spelling errors and variations, acronyms disambiguation, and more refinedtemporal analysis. Use of standard terminologies, as available in the UMLSMetathesaurus, should be the basis for these clinical information extraction tasksas they already contain well-defined concepts associated with multiple terms.Finally, regular expressions and pattern matching can be useful for extractinginformation such as name-value pairs from short phrases (e.g. ‘Cholesterol-LDL 08/26/2091 148’). However, longer phrases containing complex syntacticstructures require the use of advanced parsing techniques to identify constituentsand relations between them. In the future, we plan to explore advanced techniquessuch as dependency parsing or semantic role labeling to reduce errors appearing withlong phrases requiring deeper contextual analysis to be accurately extracted. Forinstance, in the following sentence: “Prior to her bypass surgery on theright leg, she underwent a Persantine MIBI which showed only 1 mm ST depressions andwas considered not diagnostic”; it is important for an application to linkthe negated phrase “was considered not diagotstic” with the nounphrase “Persantine MIBI” to conclude that although the patient hadthe MIBI test performed, the result was not diagnostic and therefore the testindicator (i.e. ‘MIBI’) ruled out CAD.
Highlights.
We used natural language processing (NLP) to extract heart disease riskfactors
Components were adapted from two existing NLP applications
We used existing tools without feature engineering or re-training ofmodels
Our system achieved an overall micro-averaged F1-measure of87.47%
Adapting existing tools allowed for performance comparable to sophisticatedsystems
Footnotes
Publisher's Disclaimer: This is a PDF file of an unedited manuscriptthat has been accepted for publication. As a service to our customers we areproviding this early version of the manuscript. The manuscript will undergocopyediting, typesetting, and review of the resulting proof before it ispublished in its final citable form. Please note that during the productionprocess errors may be discovered which could affect the content, and all legaldisclaimers that apply to the journal pertain.
Contributor Information
Abdulrahman Khalifa, Email: abdulrahman.aal@utah.edu.
Stéphane Meystre, Email: stephane.meystre@utah.edu.
References
- 1.Stubbs A, Kotfila C, Xu H, Uzuner Ö. Practical Applications for NLP in Clinical Research: the 2014i2b2/UTHealth Shared Tasks. Proceedings of the i2b2 2014 Shared Task and Workshop Challenges inNatural Language Processing for Clinical Data. 2015 (in press) [Google Scholar]
- 2.Stubbs A, Uzuner Ö. Annotating Risk Factors for Heart Disease in Clinical Narrativesfor Diabetic Patients. Proceedings of the i2b2 2014 Shared Task and Workshop Challenges inNatural Language Processing for Clinical Data. 2015 doi: 10.1016/j.jbi.2015.05.009. (in press) [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Ferrucci D, Lally A. Uima: an architectural approach to unstructured informationprocessing in the corporate research environment. Natural Language Engineering. 2004;10(3–4):327–348. [Google Scholar]
- 4.Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF, et al. Extracting information from textual documents in the electronichealth record: a review of recent research. Yearb Med Inform. 2008;35:128–44. [PubMed] [Google Scholar]
- 5.Pratt A. Medicine computers and linguistics. Biomed Eng. 1973:87–140. [Google Scholar]
- 6.Nadkarni PM, Ohno-Machado L, Chapman WW. Natural language processing: an introduction. Journal of the American Medical Informatics Association. 2011;18(5):544–551. doi: 10.1136/amiajnl-2011-000464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Demner-Fushman D, Chapman WW, McDonald CJ. What can natural language processing do for clinical decisionsupport? Journal of biomedical informatics. 2009;42(5):760–772. doi: 10.1016/j.jbi.2009.08.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Chi E, Lyman M, Sager N, Friedman C, Macleod C. A database of computer-structured narrative: methods of computingcomplex relations, in: Proceedings of the Annual Symposium on ComputerApplication in Medical Care. American Medical Informatics Association. 1985:221. [Google Scholar]
- 9.Friedman C, Johnson SB, Forman B, Starren J. Architectural requirements for a multipurpose natural languageprocessor in the clinical environment. Proceedings of the Annual Symposium on Computer Application in MedicalCare, American Medical Informatics Association. 1995:347. [PMC free article] [PubMed] [Google Scholar]
- 10.Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus:the MetaMap program. Proceedings of the AMIA Symposium, American Medical InformaticsAssociation. 2001:17. [PMC free article] [PubMed] [Google Scholar]
- 11.Bodenreider O. The unified medical language system (umls): integratingbiomedical terminology. Nucleic acids research. 2004;32(suppl 1):D267–D270. doi: 10.1093/nar/gkh061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, Chute CG. Mayo clinical Text Analysis and Knowledge Extraction System(cTAKES): architecture, component evaluation andapplications. Journal of the American Medical Informatics Association. 2010;17(5):507–513. doi: 10.1136/jamia.2009.001560. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Meystre SM, Thibault J, Shen S, Hurdle JF, South BR. Textractor: a hybrid system for medications and reason for theirprescription extraction from clinical text documents. Journal of the American Medical Informatics Association. 2010;17(5):559–562. doi: 10.1136/jamia.2010.004028. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Aronow DB, Fangfang F, Croft WB. Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association. 1999;6(5):393–411. doi: 10.1136/jamia.1999.0060393. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Chapman WW, Bridewell W, Hanbury P, Cooper GF, Buchanan BG. A simple algorithm for identifying negated findings and diseasesin discharge summaries. Journal of biomedical informatics. 2001;34(5):301–310. doi: 10.1006/jbin.2001.1029. [DOI] [PubMed] [Google Scholar]
- 16.Zhou L, Melton GB, Parsons S, Hripcsak G. A temporal constraint structure for extracting temporalinformation from clinical narrative. Journal of biomedical informatics. 2006;39(4):424–439. doi: 10.1016/j.jbi.2005.07.002. [DOI] [PubMed] [Google Scholar]
- 17.Bramsen P, Deshpande P, Lee YK, Barzilay R. Finding temporal order in discharge summaries. AMIA annual symposium proceedings, Vol. 2006, American MedicalInformatics Association. 2006:81. [PMC free article] [PubMed] [Google Scholar]
- 18.Chapman WW, Chu D, Dowling JN. ConText: An algorithm for identifying contextual features fromclinical text. Proceedings of the Workshop on BioNLP 2007: Biological, Translational,and Clinical Language Processing, Association for ComputationalLinguistics. 2007:81–88. [Google Scholar]
- 19.Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of sharedtasks and the need for additional creative solutions. Journal of the American Medical Informatics Association. 2011;18(5):540–543. doi: 10.1136/amiajnl-2011-000465. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Uzuner Ö, Luo Y, Szolovits P. Evaluating the state-of-the-art in automaticde-identification. Journal of the American Medical Informatics Association. 2007;14(5):550–563. doi: 10.1197/jamia.M2444. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Uzuner Ö, Goldstein I, Luo Y, Kohane I. Identifying patient smoking status from medical dischargerecords. Journal of the American Medical Informatics Association. 2008;15(1):14–24. doi: 10.1197/jamia.M2408. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Uzuner O. Second i2b2 workshop on natural language processing challengesfor clinical records. AMIA… Annual Symposium proceedings/AMIA Symposium, AMIASymposium. 2007:1252–1253. [PubMed] [Google Scholar]
- 23.Uzuner Ö, Solti I, Cadag E. Extracting medication information from clinicaltext. Journal of the American Medical Informatics Association. 2010;17(5):514–518. doi: 10.1136/jamia.2010.003947. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Uzuner Ö, South BR, Shen S, DuVall SL. 2010 i2b2/VA challenge on concepts, assertions, and relations inclinical text. Journal of the American Medical Informatics Association. doi: 10.1136/amiajnl-2011-000203. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Uzuner O, Bodnari A, Shen S, Forbush T, Pestian J, South BR. Evaluating the state of the art in coreference resolution forelectronic medical records. Journal of the American Medical Informatics Association. 2012 doi: 10.1136/amiajnl-2011-000784. amiajnl–2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sun W, Rumshisky A, Uzuner O. Evaluating temporal relations in clinical text: 2012 i2b2challenge. Journal of the American Medical Informatics Association. 2013 doi: 10.1136/amiajnl-2013-001628. amiajnl–2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Morton T, Kottmann J, Baldridge J, Bierner G. Opennlp: A java-based nlp toolkit. 2005 [Google Scholar]
- 28.De Marneffe MC, MacCartney B, Manning CD, et al. Generating typed dependency parses from phrase structureparses. Proceedings of LREC. 2006;6:449–454. [Google Scholar]
- 29.Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The weka data mining software: an update. ACM SIGKDD explorations newsletter. 2009;11(1):10–18. [Google Scholar]
- 30.Wellner B, Huyck M, Mardis S, Aberdeen J, Morgan A, Peshkin L, Yeh A, Hitzeman J, Hirschman L. Rapidly retargetable approaches to de-identification in medicalrecords. Journal of the American Medical Informatics Association. 2007;14(5):564–573. doi: 10.1197/jamia.M2435. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 31.Bialecki A, Muir R, Ingersoll G. Apache lucene 4. SIGIR 2012 workshop on open source information retrieval. 2012:17–24. [Google Scholar]
- 32.Grouin C, Moriceau V, Zweigenbaum P. Combining glass box and black box evaluations in theidentification of heart disease risk factors and their temporal relationsfrom clinical records. Journal of biomedical informatics. doi: 10.1016/j.jbi.2015.06.014. [DOI] [PMC free article] [PubMed] [Google Scholar]