Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving ~10% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables.
Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, and Pascale Fung. 2022.How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling. InProceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 160–172, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.
@inproceedings{cahyawijaya-etal-2022-long, title = "How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling", author = "Cahyawijaya, Samuel and Wilie, Bryan and Lovenia, Holy and Zhong, Huan and Zhong, MingQian and Ip, Yuk-Yu Nancy and Fung, Pascale", editor = "Lavelli, Alberto and Holderness, Eben and Jimeno Yepes, Antonio and Minard, Anne-Lyse and Pustejovsky, James and Rinaldi, Fabio", booktitle = "Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)", month = dec, year = "2022", address = "Abu Dhabi, United Arab Emirates (Hybrid)", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.louhi-1.19/", doi = "10.18653/v1/2022.louhi-1.19", pages = "160--172", abstract = "Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving {\textasciitilde}10{\%} F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables."}
<?xml version="1.0" encoding="UTF-8"?><modsCollection xmlns="http://www.loc.gov/mods/v3"><mods ID="cahyawijaya-etal-2022-long"> <titleInfo> <title>How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling</title> </titleInfo> <name type="personal"> <namePart type="given">Samuel</namePart> <namePart type="family">Cahyawijaya</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Bryan</namePart> <namePart type="family">Wilie</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Holy</namePart> <namePart type="family">Lovenia</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Huan</namePart> <namePart type="family">Zhong</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">MingQian</namePart> <namePart type="family">Zhong</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Yuk-Yu</namePart> <namePart type="given">Nancy</namePart> <namePart type="family">Ip</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Pascale</namePart> <namePart type="family">Fung</namePart> <role> <roleTerm authority="marcrelator" type="text">author</roleTerm> </role> </name> <originInfo> <dateIssued>2022-12</dateIssued> </originInfo> <typeOfResource>text</typeOfResource> <relatedItem type="host"> <titleInfo> <title>Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)</title> </titleInfo> <name type="personal"> <namePart type="given">Alberto</namePart> <namePart type="family">Lavelli</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Eben</namePart> <namePart type="family">Holderness</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Antonio</namePart> <namePart type="family">Jimeno Yepes</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Anne-Lyse</namePart> <namePart type="family">Minard</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">James</namePart> <namePart type="family">Pustejovsky</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <name type="personal"> <namePart type="given">Fabio</namePart> <namePart type="family">Rinaldi</namePart> <role> <roleTerm authority="marcrelator" type="text">editor</roleTerm> </role> </name> <originInfo> <publisher>Association for Computational Linguistics</publisher> <place> <placeTerm type="text">Abu Dhabi, United Arab Emirates (Hybrid)</placeTerm> </place> </originInfo> <genre authority="marcgt">conference publication</genre> </relatedItem> <abstract>Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving ~10% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables.</abstract> <identifier type="citekey">cahyawijaya-etal-2022-long</identifier> <identifier type="doi">10.18653/v1/2022.louhi-1.19</identifier> <location> <url>https://aclanthology.org/2022.louhi-1.19/</url> </location> <part> <date>2022-12</date> <extent unit="page"> <start>160</start> <end>172</end> </extent> </part></mods></modsCollection>
%0 Conference Proceedings%T How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling%A Cahyawijaya, Samuel%A Wilie, Bryan%A Lovenia, Holy%A Zhong, Huan%A Zhong, MingQian%A Ip, Yuk-Yu Nancy%A Fung, Pascale%Y Lavelli, Alberto%Y Holderness, Eben%Y Jimeno Yepes, Antonio%Y Minard, Anne-Lyse%Y Pustejovsky, James%Y Rinaldi, Fabio%S Proceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI)%D 2022%8 December%I Association for Computational Linguistics%C Abu Dhabi, United Arab Emirates (Hybrid)%F cahyawijaya-etal-2022-long%X Large pre-trained language models (LMs) have been widely adopted in biomedical and clinical domains, introducing many powerful LMs such as bio-lm and BioELECTRA. However, the applicability of these methods to real clinical use cases is hindered, due to the limitation of pre-trained LMs in processing long textual data with thousands of words, which is a common length for a clinical note. In this work, we explore long-range adaptation from such LMs with Longformer, allowing the LMs to capture longer clinical notes context. We conduct experiments on three n2c2 challenges datasets and a longitudinal clinical dataset from Hong Kong Hospital Authority electronic health record (EHR) system to show the effectiveness and generalizability of this concept, achieving ~10% F1-score improvement. Based on our experiments, we conclude that capturing a longer clinical note interval is beneficial to the model performance, but there are different cut-off intervals to achieve the optimal performance for different target variables.%R 10.18653/v1/2022.louhi-1.19%U https://aclanthology.org/2022.louhi-1.19/%U https://doi.org/10.18653/v1/2022.louhi-1.19%P 160-172
[How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling](https://aclanthology.org/2022.louhi-1.19/) (Cahyawijaya et al., Louhi 2022)
Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, Huan Zhong, MingQian Zhong, Yuk-Yu Nancy Ip, and Pascale Fung. 2022.How Long Is Enough? Exploring the Optimal Intervals of Long-Range Clinical Note Language Modeling. InProceedings of the 13th International Workshop on Health Text Mining and Information Analysis (LOUHI), pages 160–172, Abu Dhabi, United Arab Emirates (Hybrid). Association for Computational Linguistics.