Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-sourceNatural Language Processing (NLP) system that extracts clinical information fromelectronic health recordunstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.[1]
Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research.[4]
Development of cTAKES began at theMayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr.Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes.[5]
When Dr. Savova's moved toBoston Children's Hospital in early 2010, the core development team grew to include members there. Further external collaborations include:[5]
Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain.[5]
In 2010, cTAKES was adopted by thei2b2 program and is a central component of theSHARP Area 4.[5]
NegEx - is a tool developed at the University of Pittsburgh to detect negated terms from clinical text. The system utilizes trigger terms as a method to determine likely negation scenarios within a sentence.
ConText): an extension to NegEx, and is also developed by the University of Pittsburgh. ConText extends NegEx to not only detect negated concepts, but to also find temporal (recent, historical or hypothetical scenarios) and who the Subject (of experience) is (patient or other).
MedEx - a tool for extraction medication information from clinical text. MedEx processes free-text clinical records to recognize medication names and signature information, such as drug dose, frequency, route, and duration. Use is free with a UMLS license. It is a standalone application for Linux and Windows.
SecTag (section tagging hierarchy): recognizes note section headers using NLP, Bayesian, spelling correction, and scoring techniques. Use is free with either a UMLS or LOINC license.
(Stanford Named Entity Recognizer (NER)): Stanford’s NER is a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German.
(Stanford CoreNLP) is an integrated suite of natural language processing tools for English in Java, includingtokenization, part-of-speech tagging, named entity recognition, parsing, and coreference.