Movatterモバイル変換

Apache cTAKES

From Wikipedia, the free encyclopedia

Natural language processing system

Apache cTAKES

Developer(s)	Apache Software Foundation

Stable release	6.0.0 / September 16, 2024; 6 months ago (2024-09-16)

Repository	cTakes Repository
Written in	Java,Scala,Python
Operating system	Cross-platform
Type	Natural language processing,Bioinformatics,Text mining,Information Extraction
License	Apache License 2.0
Website	Official website

Apache cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-sourceNatural Language Processing (NLP) system that extracts clinical information fromelectronic health record unstructured text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.^[1]

cTAKES was built using theUIMA Unstructured Information Management Architecture framework andOpenNLP natural language processing toolkit.^[2]^[3]

Components

[edit]

Components of cTAKES are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems and clinical research.^[4]

These components include:

Named Section identifier
Sentence boundary detector
Rule-based tokenizer
Formatted list identifier
Normalizer
Context dependent tokenizer
Part-of-speech tagger
Phrasal chunker
Dictionary lookup annotator
Context annotator
Negation detector
Uncertainty detector
Subject detector
Dependency parser
patient smoking status identifier
Drug mention annotator

History

[edit]

Development of cTAKES began at theMayo Clinic in 2006. The development team, led by Dr. Guergana Savova and Dr.Christopher Chute, included physicians, computer scientists and software engineers. After its deployment, cTAKES became an integral part of Mayo's clinical data management infrastructure, processing more than 80 million clinical notes.^[5]

When Dr. Savova's moved toBoston Children's Hospital in early 2010, the core development team grew to include members there. Further external collaborations include:^[5]

Such collaborations have extended cTAKES' capabilities into other areas such as Temporal Reasoning, Clinical Question Answering, and coreference resolution for the clinical domain.^[5]

In 2010, cTAKES was adopted by thei2b2 program and is a central component of theSHARP Area 4.^[5]

In 2013, cTAKES released their first release as anApache Software Foundation incubator project:cTAKES 3.0.^{[citation needed]}

In March 2013, cTAKES became anApache Software Foundation Top Level Project (TLP).^[5]

References

[edit]

^Denecke, Kerstin (2015-08-31)."Tools and Resources for Information Extraction".Health Web Science: Social Media Data for Healthcare. Springer. p. 67.ISBN 978-3-319-20582-3 – via Google Books.
^Khalifa, Abdulrahman; Meystre, Stéphane (2015-12-01)."Adapting existing natural language processing resources for cardiovascular risk factors identification in clinical notes".Journal of Biomedical Informatics. Proceedings of the 2014 i2b2/UTHealth Shared-Tasks and Workshop on Challenges in Natural Language Processing for Clinical Data.58 (Supplement):S128 –S132.doi:10.1016/j.jbi.2015.08.002.PMC 4983192.PMID 26318122.
^Khudairi, Sally (2017-04-25)."The Apache Software Foundation Announces Apache® cTAKES™ v4.0" (Press release). Forest Hill, MD: The Apache Software Foundation. Globe Newswire. Retrieved2017-09-20.
^Savova, Guergana K; Masanz, James J; Ogren, Philip V; Zheng, Jiaping; Sohn, Sunghwan; Kipper-Schuler, Karin C; Chute, Christopher G (2010)."Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications".Journal of the American Medical Informatics Association.17 (5):507–513.doi:10.1136/jamia.2009.001560.ISSN 1067-5027.PMC 2995668.PMID 20819853.
^^a ^b ^c ^d ^e"History".Apache cTAKES™ - clinical Text Analysis Knowledge Extraction System. 2015-06-22. Retrieved2018-01-11.

External links

[edit]

cTAKES Official Website
Apache cTAKES Project Information page fromASF
Abstract (JAMIA)
Open Health Natural Language Processing (OHNLP) Consortium
Strategic Health IT Advanced Research Projects (SHARP) Program
SHARP Area 4 - Secondary Use of EHR Data
The Automated Retrieval Console (ARC)
Health Information Text Extraction (HITEx)) was developed as part of the i2b2 project. It is a rule-based NLP pipeline based on the GATE framework developed byInformatics for Integrating Biology and the Bedside.
Computational Language and Education Research toolkit (cleartk) (No longer maintained) has been developed at the University of Colorado at Boulder, and provides a framework for developing statistical NLP components in Java. It is built on top ofApache UIMA.
NegEx - is a tool developed at the University of Pittsburgh to detect negated terms from clinical text. The system utilizes trigger terms as a method to determine likely negation scenarios within a sentence.
ConText): an extension to NegEx, and is also developed by the University of Pittsburgh. ConText extends NegEx to not only detect negated concepts, but to also find temporal (recent, historical or hypothetical scenarios) and who the Subject (of experience) is (patient or other).
MetaMap (byUnited States National Library of Medicine): is a comprehensive concept tagging system which is built on top of theUnified Medical Language System. It requires an activeUMLS Metathesaurus License Agreement (and account) for use.
MedEx - a tool for extraction medication information from clinical text. MedEx processes free-text clinical records to recognize medication names and signature information, such as drug dose, frequency, route, and duration. Use is free with a UMLS license. It is a standalone application for Linux and Windows.
SecTag (section tagging hierarchy): recognizes note section headers using NLP, Bayesian, spelling correction, and scoring techniques. Use is free with either a UMLS or LOINC license.
(Stanford Named Entity Recognizer (NER)): Stanford’s NER is a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition in English and German.
(Stanford CoreNLP) is an integrated suite of natural language processing tools for English in Java, includingtokenization, part-of-speech tagging, named entity recognition, parsing, and coreference.

v t e The Apache Software Foundation
Top-level projects	Accumulo ActiveMQ Airavata Airflow Allura Ambari Ant Aries Arrow Apache HTTP Server APR Avro Axis Axis2 Beam Bloodhound Brooklyn Calcite Camel CarbonData Cassandra Cayenne CloudStack Cocoon Cordova CouchDB cTAKES CXF Derby Directory Drill Druid Empire-db Felix Flex Flink Flume FreeMarker Geronimo Groovy Guacamole Gump Hadoop HBase Helix Hive Iceberg Ignite Impala Jackrabbit James Jena JMeter Kafka Kudu Kylin Lucene Mahout Maven MINA mod_perl MyFaces Mynewt NiFi NetBeans Nutch NuttX OFBiz Oozie OpenEJB OpenJPA OpenNLP OрenOffice ORC PDFBox Parquet Phoenix POI Pig Pinot Pivot Qpid Roller RocketMQ Samza Shiro SINGA Sling Solr Spark Storm SpamAssassin Struts 1 Subversion Superset SystemDS Tapestry Thrift Tika TinkerPop Tomcat Trafodion Traffic Server UIMA Velocity Wicket Xalan Xerces XMLBeans Yetus ZooKeeper
Commons	BCEL BSF Daemon Jelly Logging
Incubator	Taverna
Other projects	Batik FOP Ivy Log4j
Attic	Apex AxKit Beehive iBATIS Click Continuum Deltacloud Etch Giraph Hama Harmony Jakarta Marmotta MXNet ODE River Shale Slide Sqoop Stanbol Tuscany Wave XML
Licenses	Apache License
Category

Health software

Barcoding

Bar code medication administration

Databases

Diagnostics

Bioimaging

DICOM

General	3DSlicer Drishti GIMIAS Ginkgo CADx InVesalius ITK-SNAP OsiriX VistA Imaging Voreen
Servers	Orthanc

Heuristics

Odontologic

Electronic
health records

Platforms	ApachecTAKES AHLTA athenaClinicals Centricity EMR Certify HealthLogix Cerner EHR COSTAR Datix EMIAS EMIS Web EpicCare EMR EviMed GaiaEHR GNUmed GPASS HOSxP INPS Vision MTBC WebEHR2.0 NextGen Healthcare NHS Care Records Service NHS Connecting for Health OpenEMR OpenMRS Practice Fusion PrognoCIS RXNT Summary Care Record Tebra EHR TPP SystmOne VistA VITAband ZEPRS
Terminology	Read code SNOMED CT MEDCIN LOINC UCUM RxNorm

Laboratory
management

Patient portals

Practice
management

Comprehensive	ClearHealth OpenHospital RXNT Tebra Practice Management
Specialty	Dentrix Open Dental SoftDent
Scheduling	Tebra Zocdoc
Patient engagement	Tebra