Movatterモバイル変換


[0]ホーム

URL:


Skip to main content

Advertisement

Springer Nature Link
Log in

Cheminformatics analysis and learning in a data pipelining environment

  • Full–length paper
  • Published:
Molecular Diversity Aims and scope Submit manuscript

Summary

Workflow technology is being increasingly applied in discovery information to organize and analyze data. SciTegic's Pipeline Pilot is a chemically intelligent implementation of a workflow technology known as data pipelining. It allows scientists to construct and execute workflows using components that encapsulate many cheminformatics based algorithms. In this paper we review SciTegic's methodology for molecular fingerprints, molecular similarity, molecular clustering, maximal common subgraph search and Bayesian learning. Case studies are described showing the application of these methods to the analysis of discovery data such as chemical series and high throughput screening results. The paper demonstrates that the methods are well suited to a wide variety of tasks such as building and applying predictive models of screening data, identifying molecules for lead optimization and the organization of molecules into families with structural commonality.

This is a preview of subscription content,log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Abbreviations

MCSS:

maximal common substructure search

ECFP:

extended connectivity fingerprints

FCFP:

functional class fingerprints

MDDR:

MDL drug data report

WDI:

world drug index

CATS:

chemically advanced template search

BKD:

binary kernel discrimination

CDK2:

cyclin-dependent kinase 2

DHFR:

escherichia coli dihydrofolate reductase

References

  1. SciTegic, Inc. 10188 Telesis Court, Suite 100, San Diego, CA 92121, USA,http://www.scitegic.com/products_services/pipeline_pilot.htm

  2. Todeschini, R. and Consonni, V., Handbook of Molecular Descriptors, Wiley-VCH, Weinheim, Germany, 2000.

    Google Scholar 

  3. Mark Johnson, M., Maggiora, G., (Eds.) Concepts and Applications of Molecular Similarity. Wiley, New York, 1990.

    Google Scholar 

  4. McGregor, M.J. and Pallai, P.V.,Clustering of large databases of compounds: Using the MDL ‘keys’ as structural descriptors, J. Chem. Inf. Comput. Sci., 37 (1997) 443–448.

    Article CAS  Google Scholar 

  5. Breiman, L., Friedman, J.H., Olshen, R.A. and Stone, C.J., Classification and Regression Trees, Wadsworth and Brooks/Cole, Monterey, CA, 1984.

    Google Scholar 

  6. Dubois, J. E., In Chemical Applications of Graph Theory, In Balaban, A.T. (Ed.) Academic Press, London, 1976, p. 161.

    Google Scholar 

  7. Randic, M.,Fragment search in acyclic structures, J. Chem. Inf. Comput.Sci., 18 (1978) 101–107.

    Article CAS  Google Scholar 

  8. Willett, P.,A screen set generation algorithm, J. Chem. Inf. Comp. Sci., 19 (1979) 159–162.

    Article CAS  Google Scholar 

  9. Marie, T., Gannon and Willett, P.,Sampling considerations in the selection of fragments screens for chemical substructure search systems, J. Chem. Inf. Comp. Sci., 19 (1979) 251–253.

    Article  Google Scholar 

  10. Willett, P.,The effect of screen set size on retrieval from chemical substructure search systems, J. Chem. Inf. Comp. Sci., 19 (1979) 253–255.

    Article CAS  Google Scholar 

  11. Schubert, W. and Ugi, I.,Constitutional symmetry and unique descriptors of molecules, J. Amer. Chem. Soc., 100 (1978) 37–41.

    Article CAS  Google Scholar 

  12. Bremser, W.,HOSE – A novel substructure code, Anal. Chim. Acta, 103 (1978) 355–365.

    Article CAS  Google Scholar 

  13. Bender, A., Mussa, H.Y., Glen, R.C. and Reiling, S.Molecular similarity searching using atom environments, information-based feature selection, and a naive Bayesian classifier, J.Chem. Inf. Comput. Sci., 44 (2004) 170–178.

    Article PubMed CAS  Google Scholar 

  14. Morgan, H. L.,The generation of a unique machine description for chemical structures-A technique developed at chemical sbstracts service, J. Chem. Doc., 5 (1965) 107–112.

    Article CAS  Google Scholar 

  15. Weininger, D., Weininger, A. and Weininger, J.L.,SMILES. 2. Algorithm for generation of unique SMILES notation, J. Chem. Inf. Comp. Sci., 29 (1989) 97–101.

    Article CAS  Google Scholar 

  16. Rogers, D. and Hahn, M.,Extended connectivity fingerprints, J. Chem. Inf. Model., in preparation.

  17. Bender, A. and Glen, R.C.,Molecular similarity: A key technique in molecular informatics, Org. Biomol. Chem., 2 (2004) 3204–3218.

    Article PubMed CAS  Google Scholar 

  18. Hert, J., Willett, P., Wilton, D.J., Acklin P., Azzaoui, K., Jacoby, E. and Schuffenhauer, A.,Comparison of fingerprint-based methods for virtual screening using multiple bioactive reference structures, J. Chem. Inf. Comput. Sci., 44 (2004) 1177–1185.

    Article PubMed CAS  Google Scholar 

  19. Everitt and Brian S., Cluster Analysis, Edward Arnold, A division of Hodder & Stoughton, London, 1997.

    Google Scholar 

  20. Kaufman, L. and Rousseeu, P., Finding Groups in Data, Wiley-Interscience, New York, 1990.

    Google Scholar 

  21. Hassan, M., Bielawski, J.P., Hempel, J.C. and Waldman, M.,Optimization and visualization of molecular diversity and combinatorial libraries, Molecular Diversity, 2 (1996) 64–74.

    Article PubMed CAS  Google Scholar 

  22. Asinex, Incorporated, 6 Schukinskaya St, Moscow 123182, Russia;http://www.asinex.com

  23. Raymond, J.W., Gardiner, E.J. and Willett, P.Rascal, calculation of graph similarity using maximum common edge subgraphs, Comput. J., 45 (2002) 631–644.

    Article  Google Scholar 

  24. Raymond, J.W., Gardiner, E.J. and Willett, P.,Heuristics for similarity searching of chemical graphs using a maximum common edge subgraph algorithm, J. Chem. Inf. Comput. Sci., 42 (2002) 305–316.

    Article PubMed CAS  Google Scholar 

  25. Xia, X., Maliski E.G., Gallant, P. and Rogers, D.,Classification of kinase inhibitors using a Bayesian model, J. Med. Chem., 47 (2004) 4463–4470.

    Article PubMed CAS  Google Scholar 

  26. Hert, J., Willett, P., David J.W., Acklin P., Azzaoui K., Jacoby E. and Schuffenhauer A.,New methods for ligand-based virtual screening: Use of data fusion and machine learning to enhance the effectiveness of similarity searching, J. Chem. Inf. Model. (2006), in press.

  27. Robertson, S.E. and Sparck J.K.,Relevance weighting of search terms, J. Amer. Soc. Inform. Sci., 27 (1976) 129–146.

    Article  Google Scholar 

  28. Avidon, V.V., Arolovich, V.S., Kozlava, S.P. and Piruzyan, L.A.,Statistical study of information file on biologically active compounds. II. Choice of decision rule for biologically active prediction, Khim. Farm. Zh., 12 (1978) 88–93.

    CAS  Google Scholar 

  29. Hert, J., Willett, P., Wilton, D.J., Acklin P., Azzaoui, K., Jacoby E. and Schuffenhauer A.,Comparison of topological descriptors for similarity-based virtual screening using multiple bioactive reference structures, Org. Biomol. Chem., 2 (2004) 3256–3266.

    Article PubMed CAS  Google Scholar 

  30. Barnard Chemical Information Ltd. is athttp://www.bci.gb.com/

  31. Daylight Chemical Information Systems, 27401 Los Altos, Suite 360, Mission Viejo, CA, USA 92691;http://www.daylight.com

  32. Tripos Inc. is athttp://www.tripos.com

  33. Schuffenhauer, P., Floersheim, P., Acklin, P. and Jacoby, E.,Similarity metrics for ligands reflecting the similarity of the target proteins, J. Chem. Inf. Comput. Sci., 43 (2003) 391–405.

    Article PubMed CAS  Google Scholar 

  34. Schneider, G., Neidhart, W., Giller, T. and Schmid, G.,Scaffold-hopping by topological pharmacophore search: A contribution to virtual screening, Angew. Chem. Int. Ed. Engl., 38 (1999) 2894–896.

    Article PubMed CAS  Google Scholar 

  35. The MDL Drug Data Report database is available from MDL Information Systems Inc. athttp://www.mdli.com/

  36. Bemis, G.M. and Murcko, M.A.,The properties of known drugs. 1. Molecular frameworks, J. Med. Chem., 39 (1996) 2887–2893.

    Article PubMed CAS  Google Scholar 

  37. National Cancer Institute database, available athttp://dtp.nci.nih.gov/

  38. Sielecki, T.M., Boylan, J.F., Benfield, P.A. and Trainor, G.L.,Cyclin-dependent kinase inhibitors: Useful targets in cell cycle regulation. J. Med. Chem., 43 (2000) 1–18.

    Article PubMed CAS  Google Scholar 

  39. Buolamwini, J.K.,Cell cycle molecular targets in novel anticancer drug discovery. Curr. Pharm. Des., 6 (2000) 379–392.

    Article PubMed CAS  Google Scholar 

  40. Meijer, L.,Cyclin-dependent kinases inhibitors as potential anticancer, antineurodegenerative, antiviral and antiparasitic agents, Drug Resist. Updates, 3 (2000) 83–88.

    Article CAS  Google Scholar 

  41. Sausville, E.A., Johnson, J., Alley, M., Zaharevitz, D. and Senderowicz, A.M.,Inhibition of CDKs as a therapeutic modality, Ann. N. Y. Acad. Sci., 910, Colorectal Cancer (2000) 207–222.

    Article PubMed CAS  Google Scholar 

  42. Mani, S., Wang, C., Wu, K., Francis, R. and Pestell, R.,Cyclin-dependent kinase inhibitors: Novel anticancer agents. Exp. Opin. Invest. Drugs 9 (2000) 1849–1870.

    Article CAS  Google Scholar 

  43. Fischer, P.M. and Lane, D.P.,Inhibitors of cyclin-dependent kinases as anti-cancer therapeutics, Curr. Med. Chem., 7 (2000) 1213–1245.

    PubMed CAS  Google Scholar 

  44. Senderowicz, A.M.,Small molecule modulators of cyclin-dependent kinases for cancer therapy, Oncogene, 19 (2000) 6600–6606.

    Article PubMed CAS  Google Scholar 

  45. Senderowicz, A.M.,Development of cyclin-dependent kinase modulators as novel therapeutic approaches for hematological malignancies. Leukemia, 15 (2001) 1–9.

    Article PubMed CAS  Google Scholar 

  46. Senderowicz, A.M.,Cyclin-Dependent Kinase Modulators: A Novel Class of Cell Cycle Regulators for Cancer Therapy. In Cancer Chemotherapy and Biological Response Modifiers, Annual 19; Giaccone, G., Schilsky, R., Sondel, P., (Eds.), Elsevier Science: New York, 2001, pp 165–188.

    Google Scholar 

  47. Roy, K.K. and Sausville, E.A.,Early development of cyclin dependent kinase modulators, Curr. Pharm. Des., 7 (2001) 1669–1687.

    Article PubMed CAS  Google Scholar 

  48. Fischer, P.M.,Recent advances and new directions in the discovery and development of cyclin-dependent kinase inhibitors, Curr. Opin. Drug Discovery Dev., 4 (2001) 623–634.

    CAS  Google Scholar 

  49. Bradley, E.K., Miller J.L., Saiah, E. and Grootenhuis, P.D.J.,Informative library design as an efficient strategy to identify and optimize leads: Application to cyclin-dependent kinase 2 antagonists, J. Med. Chem., 46 (2003) 4360–4364.

    Article PubMed CAS  Google Scholar 

  50. Parker, C.N.,McMaster university data-mining and docking competition. Computational models on the catwalk, J. Biomol. Screening, 10 (2005) 647–649.

    Article  Google Scholar 

  51. Rogers, D., Brown, R.D and Hahn, M.,Using extended-connectivity fingerprints with laplacian-modified Bayesian analysis in high-throughput screening follow-up, J. Biomol. Screening, 10 (2005), 682–686.

    Article CAS  Google Scholar 

  52. Klon, A.E., Glick, M., Thomas, M., Acklin, P. and Davies, J. W.,Finding more needles in the haystack: A simple and efficient method for improving high-throughput docking results, J. Med. Chem., 47 (2004) 2743–2749.

    Article PubMed CAS  Google Scholar 

  53. Klon, A.E., Glick, M. and Davies, J.W.,Combination of a Naive Bayes classifier with consensus scoring improves enrichment of high-throughput docking results, J. Med. Chem., 47 (2004) 4356–4359.

    Article PubMed CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. SciTegic, Inc., 10188 Telesis Court, Suite 100, San Diego, CA, 92121, USA

    Moises Hassan, Robert D. Brown & David Rogers

  2. Accelrys, Inc., 10188 Telesis Court, Suite 100, San Diego, CA, 92121, USA

    Shikha Varma-O’Brien

Authors
  1. Moises Hassan
  2. Robert D. Brown
  3. Shikha Varma-O’Brien
  4. David Rogers

Corresponding author

Correspondence toMoises Hassan.

Rights and permissions

About this article

Cite this article

Hassan, M., Brown, R.D., Varma-O’Brien, S.et al. Cheminformatics analysis and learning in a data pipelining environment.Mol Divers10, 283–299 (2006). https://doi.org/10.1007/s11030-006-9041-5

Download citation

Key words

Access this article

Subscribe and save

Springer+ Basic
¥17,985 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Japan)

Instant access to the full article PDF.

Advertisement


[8]ページ先頭

©2009-2025 Movatter.jp