Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

PDB2Vec: Using 3D Structural Information for Improved Protein Analysis

  • Conference paper
  • First Online:

Part of the book series:Lecture Notes in Computer Science ((LNBI,volume 14248))

  • 1018Accesses

Abstract

In recent years, machine learning methods have shown remarkable results in various protein analysis tasks, including protein classification, folding prediction, and protein-to-protein interaction prediction. However, most studies focus only on the 3D structures or sequences for the downstream classification task. Hence analyzing the combination of both 3D structures and sequences remains comparatively unexplored. This study investigates how incorporating protein sequence and 3D structure information influences protein classification performance. We use two well-known datasets, STCRDAB and PDB Bind, for classification tasks to accomplish this. To this end, we propose an embedding method called PDB2Vec to encode both the 3D structure and protein sequence data to improve the predictive performance of the downstream classification task. We performed protein classification using three different experimental settings: only 3D structural embedding (called PDB2Vec), sequence embeddings using alignment-free methods from the biology domain including onk-mers, position weight matrix, minimizers and spacedk-mers, and the combination of both structural and sequence-based embeddings. Our experiments demonstrate the importance of incorporating both three-dimensional structural information and amino acid sequence information for improving the performance of protein classification and show that the combination of structural and sequence information leads to the best performance. We show that both types of information are complementary and essential for classification tasks.

A. Ali and P. Chourasia—Equal Contribution.

This is a preview of subscription content,log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Similar content being viewed by others

References

  1. Al-Lazikani, B., Jung, J., Xiang, Z., Honig, B.: Protein structure prediction. Curr. Opin. Chem. Biol.5(1), 51–56 (2001)

    Article CAS PubMed  Google Scholar 

  2. Ali, S., Bello, B., Chourasia, P., Punathil, R.T., Zhou, Y., Patterson, M.: Pwm2vec: An efficient embedding approach for viral host specification from coronavirus spike sequences. MDPI Biology (2022)

    Google Scholar 

  3. Ali, S., Patterson, M.: Spike2vec: an efficient and scalable embedding approach for covid-19 spike sequences. In: IEEE International Conference on Big Data (Big Data), pp. 1533–1540 (2021)

    Google Scholar 

  4. Ali, S., Sahoo, B., Khan, M.A., Zelikovsky, A., Khan, I.U., Patterson, M.: Efficient approximate kernel based spike sequence classification. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2022)

    Google Scholar 

  5. Ali, S., Sahoo, B., Ullah, N., Zelikovskiy, A., Patterson, M., Khan, I.: A k-mer based approach for sars-cov-2 variant identification. In: International Symposium on Bioinformatics Research and Applications, pp. 153–164 (2021)

    Google Scholar 

  6. Batool, M., Ahmad, B., Choi, S.: A structure-based drug discovery paradigm. Int. J. Mol. Sci.20(11), 2783 (2019)

    Article CAS PubMed PubMed Central  Google Scholar 

  7. Bepler, T., Berger, B.: Learning protein sequence embeddings using information from structure. In: International Conference on Learning Representations (2019)

    Google Scholar 

  8. Bigelow, D.J., Squier, T.C.: Redox modulation of cellular signaling and metabolism through reversible oxidation of methionine sensors in calcium regulatory proteins. Biochimica et Biophysica Acta (BBA)-Proteins and Proteomics1703(2), 121–134 (2005)

    Google Scholar 

  9. Boscher, C., Dennis, J.W., Nabi, I.R.: Glycosylation, galectins and cellular signaling. Curr. Opin. Cell Biol.23(4), 383–392 (2011)

    Article CAS PubMed  Google Scholar 

  10. Brandes, N., Ofer, D., Peleg, Y., Rappoport, N., Linial, M.: ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics38(8), 2102–2110 (2022)

    Article CAS PubMed PubMed Central  Google Scholar 

  11. Chourasia, P., Ali, S., Ciccolella, S., Della Vedova, G., Patterson, M.: Clustering sars-cov-2 variants from raw high-throughput sequencing reads data. In: International Conference on Computational Advances in Bio and Medical Sciences, pp. 133–148. Springer (2021)

    Google Scholar 

  12. Chourasia, P., Ali, S., Ciccolella, S., Vedova, G.D., Patterson, M.: Reads2vec: Efficient embedding of raw high-throughput sequencing reads data. J. Comput. Biol.30(4), 469–491 (2023)

    Article CAS PubMed  Google Scholar 

  13. Chourasia, P., Tayebi, Z., Ali, S., Patterson, M.: Empowering pandemic response with federated learning for protein sequence data analysis. In: 2023 International Joint Conference on Neural Networks (IJCNN), pp. 01–08. IEEE (2023)

    Google Scholar 

  14. Chowdhury, B., Garai, G.: A review on multiple sequence alignment from the perspective of genetic algorithm. Genomics109(5–6), 419–431 (2017)

    Article CAS PubMed  Google Scholar 

  15. Denti, L., Pirola, Y., Previtali, M., Ceccato, T., Della Vedova, G., Rizzi, R., Bonizzoni, P.: Shark: fishing relevant reads in an rna-seq sample. Bioinformatics37(4), 464–472 (2021)

    Article CAS PubMed  Google Scholar 

  16. Farhan, M., Tariq, J., Zaman, A., Shabbir, M., Khan, I.: Efficient approximation algorithms for strings kernel based sequence classification. In: Advances in neural information processing systems (NeurIPS), pp. 6935–6945 (2017)

    Google Scholar 

  17. Fiser, A., Šali, A.: Modeller: generation and refinement of homology-based protein structure models. In: Methods in Enzymology, vol. 374, pp. 461–491 (2003)

    Google Scholar 

  18. Freeman, B.A., O’Donnell, V.B., Schopfer, F.J.: The discovery of nitro-fatty acids as products of metabolic and inflammatory reactions and mediators of adaptive cell signaling. Nitric Oxide77, 106–111 (2018)

    Article CAS PubMed PubMed Central  Google Scholar 

  19. Gao, W., Mahajan, S.P., Sulam, J., Gray, J.J.: Deep learning in protein structural modeling and design. Patterns1(9), 100142 (2020)

    Article CAS PubMed PubMed Central  Google Scholar 

  20. Gohlke, H., Klebe, G.: Approaches to the description and prediction of the binding affinity of small-molecule ligands to macromolecular receptors. Angew. Chem. Int. Ed.41(15), 2644–2676 (2002)

    Article CAS  Google Scholar 

  21. Golubchik, T., Wise, M.J., Easteal, S., Jermiin, L.S.: Mind the gaps: evidence of bias in estimates of multiple sequence alignments. Molecular Biol. Evol.24(11), 2433–2442 (2007).https://doi.org/10.1093/molbev/msm176

  22. Groom, C.R., Allen, F.H.: The cambridge structural database: experimental three-dimensional information on small molecules is a vital resource for interdisciplinary research and learning. Wiley Interdisciplinary Rev. Comput. Molecular Sci.1(3), 368–376 (2011)

    Article CAS  Google Scholar 

  23. Hardin, C., Pogorelov, T.V., Luthey-Schulten, Z.: Ab initio protein structure prediction. Curr. Opin. Struct. Biol.12(2), 176–181 (2002)

    Article CAS PubMed  Google Scholar 

  24. Heinzinger, M., Elnaggar, A., Wang, Y., Dallago, C., Nechaev, D., Matthes, F., Rost, B.: Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinform.20(1), 1–17 (2019)

    Article  Google Scholar 

  25. Jisna, V., Jayaraj, P.: Protein structure prediction: conventional and deep learning perspectives. Protein J.40(4), 522–544 (2021)

    Article CAS PubMed  Google Scholar 

  26. Kubinyi, H.: Structure-based design of enzyme inhibitors and receptor ligands. Curr. Opin. Drug Discov. Devel.1(1), 4–15 (1998)

    CAS PubMed  Google Scholar 

  27. Kuzmin, K., et al.: Machine learning methods accurately predict host specificity of coronaviruses based on spike sequences alone. Biochem. Biophys. Res. Commun.533(3), 553–558 (2020)

    Article CAS PubMed PubMed Central  Google Scholar 

  28. Leem, J., de Oliveira, S.H.P., Krawczyk, K., Deane, C.M.: Stcrdab: the structural t-cell receptor database. Nucleic Acids Res.46(D1), D406–D412 (2018)

    Article CAS PubMed  Google Scholar 

  29. Liu, Z., Li, Y., Han, L., Li, J., Liu, J., Zhao, Z., Nie, W., Liu, Y., Wang, R.: Pdb-wide collection of binding data: current status of the pdbbind database. Bioinformatics31(3), 405–412 (2015)

    Article CAS PubMed  Google Scholar 

  30. Oshima, A., Tani, K., Hiroaki, Y., Fujiyoshi, Y., Sosinsky, G.E.: Three-dimensional structure of a human connexin26 gap junction channel reveals a plug in the vestibule. Proc. Natl. Acad. Sci.104(24), 10034–10039 (2007)

    Article CAS PubMed PubMed Central  Google Scholar 

  31. Radivojac, P., Clark, W.T., Oron, T.R., Schnoes, A.M., Wittkop, T., Sokolov, A., Graim, K., Funk, C., Verspoor, K., Ben-Hur, A., et al.: A large-scale evaluation of computational protein function prediction. Nat. Methods10(3), 221–227 (2013)

    Article CAS PubMed PubMed Central  Google Scholar 

  32. Reynolds, C., Damerell, D., Jones, S.: Protorp: a protein-protein interaction analysis server. Bioinformatics25(3), 413–414 (2009)

    Article CAS PubMed  Google Scholar 

  33. Roberts, M., Haynes, W., Hunt, B., Mount, S., Yorke, J.: Reducing storage requirements for biological sequence comparison. Bioinformatics20, 3363–9 (2004)

    Article CAS PubMed  Google Scholar 

  34. Sapoval, N., et al.: Current progress and open challenges for applying deep learning across the biosciences. Nat. Commun.13(1), 1728 (2022)

    Article CAS PubMed PubMed Central  Google Scholar 

  35. Singh, R., Sekhon, A., Kowsari, K., Lanchantin, J., Wang, B., Qi, Y.: Gakco: a fast gapped k-mer string kernel using counting. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 356–373 (2017)

    Google Scholar 

  36. Spencer, M., Eickholt, J., Cheng, J.: A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans. Comput. Biol. Bioinf.12(1), 103–112 (2014)

    Article  Google Scholar 

  37. Strodthoff, N., Wagner, P., Wenzel, M., Samek, W.: Udsmprot: universal deep sequence models for protein classification. Bioinformatics36(8), 2401–2409 (2020)

    Article CAS PubMed PubMed Central  Google Scholar 

  38. Tayebi, Z., Ali, S., Patterson, M.: Robust representation and efficient feature selection allows for effective clustering of sars-cov-2 variants. Algorithms14(12), 348 (2021)

    Article  Google Scholar 

  39. Torrisi, M., Pollastri, G., Le, Q.: Deep learning methods in protein structure prediction. Comput. Struct. Biotechnol. J.18, 1301–1310 (2020)

    Article CAS PubMed PubMed Central  Google Scholar 

  40. Tramontano, A., Morea, V.: Assessment of homology-based predictions in casp5. Proteins: Struct. Function Bioinform.53(S6), 352–368 (2003)

    Google Scholar 

  41. Villegas-Morcillo, A., Makrodimitris, S., van Ham, R.C., Gomez, A.M., Sanchez, V., Reinders, M.J.: Unsupervised protein embeddings outperform hand-crafted sequence and structure features at predicting molecular function. Bioinformatics37(2), 162–170 (2021)

    Article CAS PubMed  Google Scholar 

  42. Xu, J.: Distance-based protein folding powered by deep learning. Proc. Natl. Acad. Sci.116(34), 16856–16865 (2019)

    Article CAS PubMed PubMed Central  Google Scholar 

  43. Yao, Y., Du, X., Diao, Y., Zhu, H.: An integration of deep learning with feature embedding for protein-protein interaction prediction. PeerJ7, e7126 (2019)

    Article PubMed PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Georgia State University, Atlanta, GA, USA

    Sarwan Ali, Prakash Chourasia & Murray Patterson

Authors
  1. Sarwan Ali

    You can also search for this author inPubMed Google Scholar

  2. Prakash Chourasia

    You can also search for this author inPubMed Google Scholar

  3. Murray Patterson

    You can also search for this author inPubMed Google Scholar

Contributions

Sarwan Ali and Prakash Chourasia–Equal Contribution

Corresponding author

Correspondence toSarwan Ali.

Editor information

Editors and Affiliations

  1. University of North Texas, Denton, TX, USA

    Xuan Guo

  2. University of Southern California, Los Angeles, CA, USA

    Serghei Mangul

  3. Georgia State University, Atlanta, GA, USA

    Murray Patterson

  4. Georgia State University, Atlanta, GA, USA

    Alexander Zelikovsky

Rights and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ali, S., Chourasia, P., Patterson, M. (2023). PDB2Vec: Using 3D Structural Information for Improved Protein Analysis. In: Guo, X., Mangul, S., Patterson, M., Zelikovsky, A. (eds) Bioinformatics Research and Applications. ISBRA 2023. Lecture Notes in Computer Science(), vol 14248. Springer, Singapore. https://doi.org/10.1007/978-981-99-7074-2_29

Download citation

Publish with us

Access this chapter

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
JPY 3498
Price includes VAT (Japan)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
JPY 10295
Price includes VAT (Japan)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
JPY 12869
Price includes VAT (Japan)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide -see info

Tax calculation will be finalised at checkout

Purchases are for personal use only


[8]ページ先頭

©2009-2025 Movatter.jp