PubChem was released in 2004 as a component of the Molecular Libraries Program (MLP) of the NIH. As of November 2015, PubChem contains more than 150 million depositor-provided substance descriptions, 60 million unique chemical structures, and 225 million biological activity test results (from over 1 million assay experiments performed on more than 2 million small-molecules covering almost 10,000 unique protein target sequences that correspond to more than 5,000 genes). It also containsRNA interference (RNAi) screening assays that target over 15,000 genes.[3]
As of August 2018, PubChem contains 247.3 million substance descriptions, 96.5 million unique chemical structures, contributed by 629 data sources from 40 countries. It also contains 237 million bioactivity test results from 1.25 million biological assays, covering >10,000 target protein sequences.[4]
As of 2020, with data integration from over 100 new sources, PubChem contains more than 293 million depositor-provided substance descriptions, 111 million unique chemical structures, and 271 million bioactivity data points from 1.2 million biological assays experiments.[5]
PubChem consists of three dynamically growing primary databases. As of 5 November 2020 (number of BioAssays is unchanged):
Compounds, 111 million entries[5] (up from 94 million entries in 2017[4]), contains pure and characterized chemical compounds.[6]
Substances, 293 million entries[5] (up from 236 million entries in 2017[7] and 163 million in Sept. 2014[8]), contains also mixtures,extracts,complexes and uncharacterized substances.
PubChem contains its own onlinemolecule editor withSMILES/SMARTS andInChI support that allows the import and export of all commonchemical file formats to search for structures and fragments.
Each hit provides information about synonyms, chemical properties, chemical structure including SMILES and InChI strings, bioactivity, and links to structurally related compounds and other NCBI databases likePubMed.
In the text search form the database fields can be searched by adding the field name in square brackets to the search term. A numeric range is represented by two numbers separated by a colon. The search terms and field names are case-insensitive. Parentheses and thelogical operators AND, OR, and NOT can be used. AND is assumed if no operator is used.