1069Accesses
23Citations
Abstract
Databases of chemical reactions contain knowledge about the reactivity of specific reagents. Although information is in general only explicitly available for compounds reported to react, it is possible to derive information about substructures that do not react in the reported reactions. Both types of information (positive and negative) can be used to train machine learning techniques to predict if a compound reacts or not with a specific reagent. The whole process was implemented with two databases of reactions, one involving BuNH2 as the reagent, and the other NaCNBH3. Negative information was derived using MOLMAP molecular descriptors, and classification models were developed with Random Forests also based on MOLMAP descriptors. MOLMAP descriptors were based exclusively on calculated physicochemical features of molecules. Correct predictions were achieved for ∼90% of independent test sets. While NaCNBH3 is a selective reducing reagent widely used in organic synthesis, BuNH2 is a nucleophile that mimics the reactivity of the lysine side chain (involved in an initiating step of the mechanism leading to skin sensitization).
This is a preview of subscription content,log in via an institution to check access.
Access this article
Subscribe and save
- Get 10 units per month
- Download Article/Chapter or eBook
- 1 Unit = 1 Article or 1 Chapter
- Cancel anytime
Buy Now
Price includes VAT (Japan)
Instant access to the full article PDF.


Similar content being viewed by others
Abbreviations
- MOLMAP:
MOLecular maps of atom-level properties
- BuNH2 :
Butylamine
- RF:
Random forest
- VOC:
Volatile organic compounds
- QSAR:
Quantitative structure activity relationship
- OOB:
Out of bag
- SVM:
Support vector machines
- ROC:
Receiver operating characteristic
- SOM:
Self organizing maps
- HTS:
High-throughput screening
References
Aptula AO, Patlewicz G, Roberts DW (2005) Chem Res Toxicol 18:1420. doi:10.1021/tx050075m
Benigni R (2005) Chem Rev 105:1767. doi:10.1021/cr030049y
Metz JT, Huth JR, Hajduk PJ (2007) J Comput Aided Mol Des 21:139. doi:10.1007/s10822-007-9109-z
http://ec.europa.eu/environment/chemicals/reach/reach_intro.htm
Directive 2003/15/EC of the European Parliament and of the Council of 27 February 2003 amending Council Directive 76/768/EEC. OJ L066, 26–35, 11 March 2003
Lilienblum W, Dekant W, Foth H, Gebel T, Hengstler JG, Kahl R, Kramer P-J, Schweinfurth H, Wollin K-M (2008) Arch Toxicol 82:211. doi:10.1007/s00204-008-0279-9
Aptula AO, Patlewicz G, Roberts DW, Schultz TW (2006) Toxicol In Vitro 20:239. doi:10.1016/j.tiv.2005.07.003
Gerberick GF, Vassallo JD, Bailey RE, Chaney JG, Morrall SW, Lepoittevin J-P (2004) Toxicol Sci 81:332. doi:10.1093/toxsci/kfh213
Gerberick GF, Vassallo JD, Foertsch LM, Price BB, Chaney JG, Lepoittevin J-P (2007) Toxicol Sci 97:427. doi:10.1093/toxsci/kfm064
Natsch A, Emter R, Ellis G (2009) Toxicol Sci 107:106. doi:10.1093/toxsci/kfn204
Patlewicz G, Aptula AO, Roberts DW, Uriarte E (2008) QSAR Comb Sci 27:60. doi:10.1002/qsar.200710067
Gramatica P, Pilutti P, Papa E (2004) Atmos Environ 38:6167. doi:10.1016/j.atmosenv.2004.07.026
Chaudry UA, Popelier PLA (2003) J Phys Chem A 107:4578. doi:10.1021/jp034272a
Zhang H, Qu X, Ando H (2005) J Mol Struct THEOCHEM 725:31. doi:10.1016/j.theochem.2005.02.086
Hiob R, Karelson M (2000) J Chem Inf Comput Sci 40:1062. doi:10.1021/ci0004457
Meylan WM, Howard PH (2003) Environ Toxicol Chem 22:1724. doi:10.1897/01-275
Gramatica P, Consonni V, Todeschini R (1999) Chemosphere 38:1371. doi:10.1016/S0045-6535(98)00539-6
Atkinson R (1998) Environ Toxicol Chem 7:435. doi:10.1897/1552-8618(1988)7[435:EOGHRR]2.0.CO;2
Gramatica P, Pilutti P, Papa E (2004) J Chem Inf Comput Sci 44:1794
Klamt A (1993) Chemosphere 26:1273. doi:10.1016/0045-6535(93)90181-4
Fatemi MH (2006) Anal Chim Acta 556:355. doi:10.1016/j.aca.2005.09.033
Huth JR, Mendoza R, Olejniczak ET, Johnson RW, Cothron DA, Liu Y, Lerner CG, Chen J, Hajduk PJ (2005) J Am Chem Soc 127:217
Satoh H, Itono S, Funatsu K, Takano K, Nakata TA (1999) J Chem Inf Comput Sci 39:671. doi:10.1021/ci9801567
Satoh H, Funatsu K, Takano K, Nakata T (2000) Bull Chem Soc Jpn 73:1955. doi:10.1246/bcsj.73.1955
Simon V, Gasteiger J, Zupan J (1993) J Am Chem Soc 115:9148. doi:10.1021/ja00073a034
Gupta S, Mathew S, Abreu PM, Aires-de-Sousa J (2006) Bioorg Med Chem 14:1199. doi:10.1016/j.bmc.2005.09.047
Zhang Q, Aires-de-Sousa J (2007) J Chem Inf Model 47:1. doi:10.1021/ci050520j
Zhang Q-Y, Aires-de-Sousa J (2005) J Chem Inf Model 45:1775. doi:10.1021/ci0502707
Latino DARS, Aires-de-Sousa J (2006) Angew Chem Int Ed 45:2066. doi:10.1002/anie.200503833
Latino DARS, Zhang Q-Y, Aires-de-Sousa J (2008) Bioinformatics 24:2236. doi:10.1093/bioinformatics/btn405
Kohonen T (1998) Self-Organization and Associative Memory. Springer, Berlin
Breiman L (2001) Mach Learn 45:5. doi:10.1023/A:1010933404324
Svetnik V, Liaw A, Tong C, Culberson JC, Sheridan RP, Feuston BPJ (2003) Chem Inf Comput Sci 43:1947
R Development Core Team (2004). R: A language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria. ISBN 3-900051-07-0, URLhttp://www.R-project.org
Fortran original by Leo Breiman, Adele Cutler, R port by Andy Liaw and Matthew Wiener. (2004).http://www.stat.berkeley.edu/users/breiman/
Clayden J, Greeves N, Warren S, Wothers P (2001) Organic Chemistry. Oxford University Press, Oxford
Acknowledgments
G.C. and S.G. acknowledge Fundação para a Ciência e Tecnologia (Lisbon, Portugal) for financial support under grants SFRH/BD/18354/2004 and SFRH/BPD/14475/2003. Molecular Networks GmbH (Erlangen, Germany) and Infochem (Munich, Germany) are acknowledged for access to the PETRA program and to subsets of chemical reactions from the SPRESI database, respectively.
Author information
Authors and Affiliations
REQUIMTE, CQFB, Departamento de Química, Faculdade de Ciências e Tecnologia, Universidade Nova de Lisboa, 2829-516, Caparica, Portugal
Gonçalo V. S. M. Carrera, Sunil Gupta & João Aires-de-Sousa
- Gonçalo V. S. M. Carrera
You can also search for this author inPubMed Google Scholar
- Sunil Gupta
You can also search for this author inPubMed Google Scholar
- João Aires-de-Sousa
You can also search for this author inPubMed Google Scholar
Corresponding author
Correspondence toJoão Aires-de-Sousa.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Carrera, G.V.S.M., Gupta, S. & Aires-de-Sousa, J. Machine learning of chemical reactivity from databases of organic reactions.J Comput Aided Mol Des23, 419–429 (2009). https://doi.org/10.1007/s10822-009-9275-2
Received:
Accepted:
Published:
Issue Date:
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative