Part of the book series:Communications in Computer and Information Science ((CCIS,volume 1964))
Included in the following conference series:
786Accesses
Abstract
Cancer is a complex disease marked by uncontrolled cell growth, potentially leading to tumors and metastases. Identifying cancer types is crucial for treatment decisions and patient outcomes. T Cell receptors (TCRs) are vital proteins in adaptive immunity, specifically recognizing antigens and playing a pivotal role in immune responses, including against cancer. TCR diversity makes them promising for targeting cancer cells, aided by advanced sequencing revealing potent anti-cancer TCRs and TCR-based therapies. Effectively analyzing these complex biomolecules necessitates representation and capturing their structural and functional essence. We explore sparse coding for multi-classifying TCR protein sequences with cancer categories as targets. Sparse coding, a machine learning technique, represents data with informative features, capturing intricate amino acid relationships and subtle sequence patterns. We compute TCR sequencek-mers, applying sparse coding to extract key features. Domain knowledge integration improves predictive embeddings, incorporating cancer properties like Human leukocyte antigen (HLA) types, gene mutations, clinical traits, immunological features, and epigenetic changes. Our embedding method, applied to a TCR benchmark dataset, significantly outperforms baselines, achieving 99.8% accuracy. Our study underscores sparse coding’s potential in dissecting TCR protein sequences in cancer research.
Z. Tayebi and S. Ali—Equal Contribution.
This is a preview of subscription content,log in via an institution to check access.
Access this chapter
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
- Chapter
- JPY 3498
- Price includes VAT (Japan)
- eBook
- JPY 11439
- Price includes VAT (Japan)
- Softcover Book
- JPY 14299
- Price includes VAT (Japan)
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ali, S., Patterson, M.: Spike2Vec: an efficient and scalable embedding approach for covid-19 spike sequences. In: IEEE Big Data, pp. 1533–1540 (2021)
Ali, S., Bello, B., et al.: PWM2Vec: an efficient embedding approach for viral host specification from coronavirus spike sequences. MDPI Biol. (2022)
Alley, E.C., Khimulya, G., et al.: Unified rational protein engineering with sequence-based deep representation learning. Nat. Methods16(12), 1315–1322 (2019)
Bai, F., et al.: Use of peripheral lymphocytes and support vector machine for survival prediction in breast cancer patients. Transl. Cancer Res.7(4) (2018)
van den Berg, J.H., Heemskerk, B., van Rooij, N., et al.: Tumor infiltrating lymphocytes (TIL) therapy in metastatic melanoma: boosting of neoantigen-specific T cell reactivity and long-term follow-up. J. Immunother. Cancer8(2) (2020)
Bileschi, M.L., et al.: Using deep learning to annotate the protein universe. BioRxiv, p. 626507 (2019)
Brandes, N., Ofer, D., et al.: ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics38(8), 2102–2110 (2022)
Bufe, S., et al.: PD-1/CTLA-4 blockade leads to expansion of CD8+ PD-1int TILs and results in tumor remission in experimental liver cancer. Liver Cancer (2022)
Carosella, E.D., Ploussard, G., LeMaoult, J., Desgrandchamps, F.: A systematic review of immunotherapy in urologic cancer: evolving roles for targeting of CTLA-4, PD-1/PD-L1, and HLA-G. Eur. Urol.68(2), 267–279 (2015)
Chen, S.Y., et al.: TCRdb: a comprehensive database for T-cell receptor sequences with powerful search function. Nucleic Acids Res.49(D1), D468–D474 (2021)
Chourasia, P., Ali, S., Ciccolella, S., Vedova, G.D., Patterson, M.: Reads2Vec: efficient embedding of raw high-throughput sequencing reads data. J. Comput. Biol.30(4), 469–491 (2023)
Courtney, A.H., Lo, W.L., Weiss, A.: TCR signaling: mechanisms of initiation and propagation. Trends Biochem. Sci.43(2), 108–123 (2018)
De Visser, K.E., Eichten, A., Coussens, L.M.: Paradoxical roles of the immune system during cancer development. Nat. Rev. Cancer6(1), 24–37 (2006)
Dunne, M.R., et al.: Characterising the prognostic potential of HLA-DR during colorectal cancer development. Cancer Immunol. Immunother.69, 1577–1588 (2020)
Farhan, M., Tariq, J., Zaman, A., Shabbir, M., Khan, I.U.: Efficient approximation algorithms for strings kernel based sequence classification. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Fodde, R.: The APC gene in colorectal cancer. Eur. J. Cancer38(7), 867–871 (2002)
Gittelman, R.M., Lavezzo, E., Snyder, T.M., Zahid, H.J., Carty, C.L., et al.: Longitudinal analysis of t cell receptor repertoires reveals shared patterns of antigen-specific response to SARS-CoV-2 infection. JCI Insight7(10) (2022)
Gonzalez, H., et al.: Roles of the immune system in cancer: from tumor initiation to metastatic progression. Genes Dev.32(19–20), 1267–1284 (2018)
Hee, B.J., Kim, M., et al.: Feature selection for colon cancer detection using k-means clustering and modified harmony search algorithm. Mathematics9(5), 570 (2021)
Heinzinger, M., Elnaggar, A., et al.: Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics20(1), 1–17 (2019)
Hoadley, K.A., Yau, C., et al.: Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer. Cell173(2), 291–304 (2018)
Hu, M., et al.: Exploring evolution-based & -free protein language models as protein function predictors. arXiv preprintarXiv:2206.06583 (2022)
Iqbal, M.J., Faye, I., Samir, B.B., Md Said, A.: Efficient feature selection and classification of protein sequence data in bioinformatics. Sci. World J.2014 (2014)
Janeway, C.A. Jr.: The major histocompatibility complex and its functions. In: Immunobiology: The Immune System in Health and Disease. 5th edn. Garland Science (2001)
Johnson, N., et al.: Counting potentially functional variants in BRCA1, BRCA2 and ATM predicts breast cancer susceptibility. Hum. Mol. Genet.16(9), 1051–1057 (2007)
Kelly, T.K., De Carvalho, D.D., Jones, P.A.: Epigenetic modifications as therapeutic targets. Nat. Biotechnol.28(10), 1069–1078 (2010)
Kidman, J., et al.: Characteristics of TCR repertoire associated with successful immune checkpoint therapy responses. Frontiers Immunol.11, 587014 (2020)
Kuzmin, K., et al.: Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem. Biophys. Res. Commun.533(3), 553–558 (2020)
Lee, A., et al.: BOADICEA: a comprehensive breast cancer risk prediction model incorporating genetic and nongenetic risk factors. Genet. Med.21(8), 1708–1718 (2019)
Liang, H., Lu, T., Liu, H., Tan, L.: The relationships between HLA-A and HLA-B genes and the genetic susceptibility to breast cancer in Guangxi. Russ. J. Genet.57, 1206–1213 (2021)
Lin, Z., Akin, H., Rao, R., et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379(6637), 1123–1130 (2023)
Loibl, S., Gianni, L.: HER2-positive breast cancer. Lancet389(10087), 2415–2429 (2017)
Lu, Y.C., et al.: Single-cell transcriptome analysis reveals gene signatures associated with T-cell persistence following adoptive cell therapygene signatures associated with T-cell persistence. Cancer Immunol. Res.7(11), 1824–1836 (2019)
Makuuchi, M., Kosuge, T., Takayama, T., et al.: Surgery for small liver cancers. In: Seminars in Surgical Oncology, vol. 9, pp. 298–304. Wiley Online Library (1993)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprintarXiv:1301.3781 (2013)
Min, S., Park, S., et al.: Pre-training of deep bidirectional protein sequence representations with structural information. IEEE Access9, 123912–123926 (2021)
Nambiar, A., Heflin, M., Liu, S., Maslov, S., Hopkins, M., Ritz, A.: Transforming the language of life: transformer neural networks for protein prediction tasks. In: Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1–8 (2020)
Olshausen, B.A., Field, D.J.: Sparse coding of sensory inputs. Curr. Opin. Neurobiol.14(4), 481–487 (2004)
Ostrovsky-Berman, M., et al.: Immune2vec: embedding B/T cell receptor sequences in n using natural language processing. Frontiers Immunol.12, 680687 (2021)
Peshkin, B.N., Alabek, M.L., Isaacs, C.: BRCA1/2 mutations and triple negative breast cancers. Breast Dis.32(1–2), 25–33 (2011)
Ranstam, J., Cook, J.: Lasso regression. J. Br. Surgery105(10), 1348 (2018)
Rotte, A.: Combination of CTLA-4 and PD-1 blockers for treatment of cancer. J. Exp. Clin. Cancer Res.38, 1–12 (2019)
Schaafsma, E., et al.: Pan-cancer association of HLA gene expression with cancer prognosis and immunotherapy efficacy. Br. J. Cancer125(3), 422–432 (2021)
Shah, K., Al-Haidari, A., Sun, J., Kazi, J.U.: T cell receptor (TCR) signaling in health and disease. Signal Transduct. Target. Ther.6(1), 412 (2021)
Shen, J., Qu, Y., Zhang, W., et al.: Wasserstein distance guided representation learning for domain adaptation. In: AAAI Conference on Artificial Intelligence (2018)
Singh, R., et al.: GaKCo: a fast gapped k-mer string Kernel using counting. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 356–373 (2017)
Stanton, S.E., Disis, M.L.: Clinical significance of tumor-infiltrating lymphocytes in breast cancer. J. Immunother. Cancer4, 1–7 (2016)
Van, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. (JMLR)9(11) (2008)
Wan, F., et al.: DeepCPI: a deep learning-based framework for large-scale in silico drug screening. Genomics Proteomics Bioinform.17(5), 478–495 (2019)
Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: International Conference on Machine Learning, pp. 478–487 (2016)
Yang, X., Yang, S., Li, Q., Wuchty, S., Zhang, Z.: Prediction of human-virus protein-protein interactions through a sequence embedding-based machine learning method. Comput. Struct. Biotechnol. J.18, 153–161 (2020)
Zhang, J., et al.: Recurrent neural networks with long term temporal dependencies in machine tool wear diagnosis and prognosis. SN Appl. Sci.3, 1–13 (2021)
Zhu, J.D.: The altered DNA methylation pattern and its implications in liver cancer. Cell Res.15(4), 272–280 (2005)
Author information
Authors and Affiliations
Georgia State University, Atlanta, GA, 30302, USA
Zahra Tayebi, Sarwan Ali, Prakash Chourasia, Taslim Murad & Murray Patterson
- Zahra Tayebi
You can also search for this author inPubMed Google Scholar
- Sarwan Ali
You can also search for this author inPubMed Google Scholar
- Prakash Chourasia
You can also search for this author inPubMed Google Scholar
- Taslim Murad
You can also search for this author inPubMed Google Scholar
- Murray Patterson
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toMurray Patterson.
Editor information
Editors and Affiliations
School of Automation, Central South University, Changsha, China
Biao Luo
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Long Cheng
Institute of Cyber-Systems and Control, Zhejiang University, Hangzhou, China
Zheng-Guang Wu
School of Automation, Guangdong University of Technology, Guangzhou, China
Hongyi Li
School of Electrical Engineering and Telecommunications, UNSW Sydney, Sydney, NSW, Australia
Chaojie Li
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tayebi, Z., Ali, S., Chourasia, P., Murad, T., Patterson, M. (2024). T Cell Receptor Protein Sequences and Sparse Coding: A Novel Approach to Cancer Classification. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1964. Springer, Singapore. https://doi.org/10.1007/978-981-99-8141-0_17
Download citation
Published:
Publisher Name:Springer, Singapore
Print ISBN:978-981-99-8140-3
Online ISBN:978-981-99-8141-0
eBook Packages:Computer ScienceComputer Science (R0)
Share this paper
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative