Movatterモバイル変換


[0]ホーム

URL:


Chemical Component Dictionary

The Chemical Component Dictionary is as an external reference file describing all residue and small molecule components found in PDB entries. This dictionary contains detailed chemical descriptions for standard and modified amino acids/nucleotides, small molecule ligands, and solvent molecules. Each chemical definition includes descriptions of chemical properties such as stereochemical assignments, chemical descriptors (SMILES & InChI), systematic chemical names, and idealized coordinates (generated using Molecular Networks' Corina, and if there are issues, OpenEye's OMEGA).

The dictionary is organized by the 3-character alphanumeric code that PDB assigns to each chemical component. New chemical component definitions appear in the dictionary as the entries in which they are observed are released in the PDB archive; consequently, the dictionary is updated with each weekly PDB release. The dictionary is regularly reviewed and remediated. Any obsoleted components remain in the dictionary marked with status OBS.

Users can search and browse the Chemical Component Dictionary using resources such asPDBeChem andRCSB PDB Chemical Search.

The entire Chemical Component Dictionary and the companion dictionary of amino acid protonation variants can be downloaded from the wwPDB ftp site:

Chemical Component Dictionary:mmCIF (gz) |SDF/MOL (gz)
Protonation Variants Companion Dictionary:mmCIF (gz)
Chemical Component Model data file:mmCIF (gz)

In addition, SMILES/InChI/InChIKey data files can be downloaded here:

Please note that these files are large, and may take a while to download. You may wish to right-click on the "plain text" link to save the file.

The dictionary of protonation variants provides additional nomenclature information for the protonation states of standard amino acids in N-terminal, C-terminal, and free forms, and includes common side chain protonation states. The identifiers used in this extension dictionary longer identifier codes to distinguish the various protonation forms of the standard amino acids. For instance, an identifier codeARG_LFOH_DHH12 is used to identify the arginine variant with a neutral peptide unit and side chain protonated at NH1. The extended identifier codes are not compatible with the 3-character format restrictions for the residue identifier in the PDB format, so these codes do not currently appear in PDB files. In PDB entries, protonated residues are identified by the 3-character code of their parent amino acid; however, the atom nomenclature for protonated forms will be taken from the variant dictionary definitions.

TheChemical Component Model data file contains the matching chemical structures in the Chemical Component Dictionary and the Cambridge Structural Database (CSD) archive. This reference file includes accession code correspondences, Cartesian coordinates and R-value, data-collection temperature and a disorder flag, SMILES and InChI descriptors, and a Digital Object Identifier (DOI) for the citation associated with the CSD entry.

Prior to development of the Chemical Component Dictionary, PDB chemical information was solely in the form of connection tables. This older representation, called the PDB HET dictionary, is still made available on the wwPDB ftp site (download). PDB HET format dictionary entries for individual components are available athttps://files.wwpdb.org/pub/pdb/data/monomers/.

The Chemical Component Dictionary was formerly called the HET Group Dictionary.

Descriptions of chemical components inmmCIF andPDB formats are provided below.

J.D. Westbrook, C. Shao, Z. Feng, M. Zhuravleva, S. Velankar, J. Young (2014) The chemical component dictionary: complete descriptions of constituent molecules in experimentally determined 3D macromolecules in the Protein Data Bank Bioinformatics doi:10.1093/bioinformatics/btu789

PDBeChem

PDBeChem1 offers a wide range of possibilities for searching and exploring the dictionary:

  • Search for a particular 3-letter code
  • Search using part of the name
  • Search for a formula range
  • Search for a substructure
  • Search for a fragment expression

Users can also search by references in macromolecules, molecule classification, and atom energy type.

A generic browsing interface lets users follow links that are available from every record in order to navigate through the relationships of the dictionary. For example, a relationship link can be followed to view the atoms of a ligand and then for a particular atom, its bonds and energy types and so on.

For more information, please see
https://www.ebi.ac.uk/msd-srv/msdchem/ligand/help.htm

RCSB PDB Chemical Search

RCSB PDBChemical Search can be used to navigate the Chemical Component Dictionary.

TheChemical Similarity search allows you to find small molecules in the PDB archive that are similar to your query. These molecules are found in theChemical Component Dictionary (CCD) and theBiologically Interesting Molecule Reference Dictionary (BIRD). You can search using properties such asmolecular formula orchemical descriptors.

You can use this search to find chemical components (for example, drugs, inhibitors, modified residues, or building blocks such as amino acids and nucleotides) that:

  • aresimilar to the query formula or descriptor (e.g., differing by one or two atoms or functional groups),
  • contain the query formula or descriptor as a substructure within a larger molecule, or
  • exactly or very closely match the query formula or descriptor.

Chemical Components in mmCIF Format

The mmCIF format combines collections of related data items (tokens) into categories. A category is essentially a table in which each token represents a row in the table. The question mark (?) is used to mark an item value as missing. A period (.) may be used to identify that there is no appropriate value for the item or that a value has been intentionally omitted.

Vectors and tables of data may be encoded in mmCIF using a loop_ directive. To build a table, the data item names corresponding to the table columns are preceded by the loop_ directive, and followed by the corresponding rows of data.

A detailed description of themmCIF syntax and logic structure is available.

In the Chemical Component Dictionary, each chemical component is defined by sets of tokens in the five categories. Click on a category link to see the dictionary definitions of the category contents (with examples).

chem_comp  
TokenDefinitionExample
_chem_comp.idThe alphanumeric code for the chemical component.HYP
_chem_comp.nameThe name of the chemical component.4-HYDROXYPROLINE
_chem_comp.typeThe type of monomer.L-peptide linking
_chem_comp.pdbx_typeA preliminary internal classification used by PDB.ATOMP
_chem_comp.formulaThe chemical formula of the chemical component.C5 H9 N1 O3'
_chem_comp.mon_nstd_parent_comp_idThe identifier for the parent component of the nonstandard component. May be a comma-separated list if this component is derived from multiple components.PRO
_chem_comp.pdbx_synonymsSynonym list for the non-standard residue.HYDROXYPROLINE
_chem_comp.pdbx_formal_chargeThe formal charge on the chemical component.+1
_chem_comp.pdbx_initial_dateDate the chemical component was added to the database.yyyy-mm-dd
_chem_comp.pdbx_modified_dateDate that the component was last modified.yyyy-mm-dd
_chem_comp.pdbx_ambiguous_flagFor ligands with unconventional bonding (i.e. ligands with transition metal complexes).code
_chem_comp.pdbx_release statusStatus of ligand (released, hold, obsoleted).yyyy-mm-dd
_chem_comp.pdbx_replaced_byIdentifies the _chem_comp.id of the new component that has replaced this component.3-letter identifier
_chem_comp.pdbx_replacesIdentifies the _chem_comp.id of the component this entry replaces. Converse of _replaced_by.3-letter identifier
_chem_comp.formula_weight Formula mass of the chemical component in Daltons.131.131
_chem_comp.one_letter_codeReports the one-letter code of the component, if applicable.one-letter identifier
_chem_comp.three_letter_codeReports the three-letter code of the component, if applicable.ATP
_chem_comp.pdbx_model_coordinates_detailsProvides additional details about the model coordinates in the component definition.text
_chem_comp.pdbx_model_coordinates_missing_flagThis data item identifies if model coordinates are missing in this definitionY or N
_chem_comp.pdbx_ideal_coordinates_detailsIdentifies the source of the ideal coordinates in the component definition.text
_chem_comp.pdbx_ideal_coordinates_missing_flagIdentifies if ideal coordinates are missing in this definition.Y or N
_chem_comp.pdbx_model_coordinates_db_codeIdentifies the PDB database code from which the heavy atom model coordinates were obtained.PDB entry id
_chem_comp.pdbx_processing_siteIdentifies the deposition site that processed this chemical component defintion.RCSB PDB, PDBj, PDBe
chem_comp_atom  

Tokens in this section are looped through for each atom in the chemical component

TokenDefinitionExample
_chem_comp_atom.comp_idSame as _chem_comp.idHYP
_chem_comp_atom.atom_idIdentifier for each atom in the chemical component - new formatCA
_chem_comp_atom.alt_atom_idPrevious format of identifier for each atom in the chemical component.CA
_chem_comp_atom.type_symbolThe element type for each atom in the chemical component.C O N, etc.
_chem_comp_atom.chargeThe formal charge assigned to each atom in the chemical component.0
_chem_comp_atom.pdbx_alignDetermines which column the atom name appears within the PDB coordinate files. The possible values are 0 or 1.0 or 1
_chem_comp_atom.pdbx_aromatic_flagDefines atoms in an aromatic moiety.Y or N
_chem_comp_atom.pdbx_leaving_atom_flagFlags atoms with "leaving" capability.Y or N
_chem_comp_atom.pdbx_stereo_configDefines the stereochemical configuration of the chiral center atom.R or S or N
_chem_comp_atom.model_Cartn_xThe x component of the coordinates for each atom specified as orthogonal angstroms.26.056
_chem_comp_atom.model_Cartn_yThe y component of the coordinates for each atom specified as orthogonal angstroms.5.609
_chem_comp_atom.model_Cartn_zThe z component of the coordinates for each atom specified as orthogonal angstroms.5.594
_chem_comp_atom.pdbx_model_Cartn_x_idealComputed idealized coordinates, x component of the vector (in Angstroms)number
_chem_comp_atom.pdbx_model_Cartn_y_idealComputed idealized coordinates, y component of the vector (in Angstroms)number
_chem_comp_atom.pdbx_model_Cartn_z_idealComputed idealized coordinates, z component of the vector (in Angstroms)number
_chem_comp_atom.pdbx_ordinalOrdinal index for the chemical component atom list.1 (integer)
chem_comp_bond  

Tokens in this section are looped through for each bond in the chemical component

TokenDefinitionExample
_chem_comp_bond.comp_idSame as _chem_comp.idHYP
_chem_comp_bond.atom_id_1The ID of the first of the two atoms that define the bond.N
_chem_comp_bond.atom_id_2The ID of the second of the two atoms that define the bond.CA
_chem_comp_bond.value_orderThe bond order of the chemical bond associated with the specified atoms.SING
_chem_comp_bond.pdbx_aromatic_flagDefines aromatic bonds.Y or N
_chem_comp_bond.pdbx_stereo_configDefines stereochemical bonds.Y or N
_chem_comp_bond.pdbx_ordinalOrdinal index for the component bond list.1 (integer)
pdbx_chem_comp_descriptor  
TokenDefinitionExample
_pdbx_chem_comp_descriptor.comp_idThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.text
_pdbx_chem_comp_descriptor.typeThe type of the program or library used to compute the descriptor.text
_pdbx_chem_comp_descriptor.programThe name of the program or library used to compute the descriptor.text
_pdbx_chem_comp_descriptor.program_versionThe version of the program or library used to compute the descriptor.version number
_pdbx_chem_comp_descriptor.descriptorThe chemical descriptor value for this component.code
pdbx_chem_comp_identifier  
TokenDefinitionExample
_pdbx_chem_comp_identifier.comp_idThis data item is a pointer to _chem_comp.id in the CHEM_COMP category.text
_pdbx_chem_comp_identifier.typeContains the identifier type.CAS Reg No. or PUBCHEM, etc.
_pdbx_chem_comp_identifier.programThe name of the program or library used to compute the identifier.OpenEye OECHEM program, etc.
_pdbx_chem_comp_identifier.program_versionThe version of the program or library used to compute the identifier.v1.2 (numbers)
_pdbx_chem_comp_identifier.identifierContains the identifier value for this chemical component..text

In a PDB entry, the mmCIF category chem_comp is used to describe the chemical components in the file. The chemical name is described in chem_comp.name, chemical formula in chem_comp.formula, and molecular weight in chem_comp.formula_weight.

For example, the mmCIF file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):

data_174#_chem_comp.id                                    174_chem_comp.name                                  "4-CHLORO-BENZOIC ACID"_chem_comp.type                                  NON-POLYMER_chem_comp.pdbx_type                             HETAIN_chem_comp.formula                               "C7 H5 Cl  O2"_chem_comp.mon_nstd_parent_comp_id               ?_chem_comp.pdbx_synonyms                         ?_chem_comp.pdbx_formal_charge                    0_chem_comp.pdbx_initial_date                     2004-05-07_chem_comp.pdbx_modified_date                    2008-04-29_chem_comp.pdbx_ambiguous_flag                   N_chem_comp.pdbx_release_status                   REL_chem_comp.pdbx_replaced_by                      ?_chem_comp.pdbx_replaces                         ?_chem_comp.formula_weight                        156.566_chem_comp.one_letter_code                       ?_chem_comp.three_letter_code                     174_chem_comp.pdbx_model_coordinates_details        ?_chem_comp.pdbx_model_coordinates_missing_flag   N_chem_comp.pdbx_ideal_coordinates_details        ?_chem_comp.pdbx_ideal_coordinates_missing_flag   N_chem_comp.pdbx_model_coordinates_db_code        Â   1T5D_chem_comp.pdbx_processing_site                  RCSB

Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See theexample below).

Chemical Components in PDB Format

The heterogen section of a PDB coordinate file describes ligands in the entry. The chemical name of the ligand is given in the HETNAM record and the chemical formula is given in the FORMUL record. Any synonyms for the chemical name are given in the HETSYN records.

For example, the PDB format file for PDB entry 1t5d contains the ligand 4-Chloro-benzoic Acid (ID code: 174):

HET        174             15HETNAM     174 4-CHLORO-BENZOIC  ACIDFORMUL     174    C7 H5 CL O2

Further information describing this residue (174) is then provided in the Chemical Component Dictionary (See theexample below).

Please refer to thePDB File Format Guide for further description.

Examples

Chemical Component Dictionary (mmCIF Format)

data_174#_chem_comp.id                                    174_chem_comp.name                                  "4-CHLORO-BENZOIC ACID"_chem_comp.type                                  NON-POLYMER_chem_comp.pdbx_type                             HETAIN_chem_comp.formula                               "C7 H5 Cl O2"_chem_comp.mon_nstd_parent_comp_id               ?_chem_comp.pdbx_synonyms                         ?_chem_comp.pdbx_formal_charge                    0_chem_comp.pdbx_initial_date                     2004-05-07_chem_comp.pdbx_modified_date                    2008-04-29_chem_comp.pdbx_ambiguous_flag                   N_chem_comp.pdbx_release_status                   REL_chem_comp.pdbx_replaced_by                      ?_chem_comp.pdbx_replaces                         ?_chem_comp.formula_weight                        156.566_chem_comp.one_letter_code                       ?_chem_comp.three_letter_code                     174_chem_comp.pdbx_model_coordinates_details        ?_chem_comp.pdbx_model_coordinates_missing_flag   N_chem_comp.pdbx_ideal_coordinates_details        ?_chem_comp.pdbx_ideal_coordinates_missing_flag   N_chem_comp.pdbx_model_coordinates_db_code        1T5D_chem_comp.pdbx_processing_site                  RCSB#loop__chem_comp_atom.comp_id_chem_comp_atom.atom_id_chem_comp_atom.alt_atom_id_chem_comp_atom.type_symbol_chem_comp_atom.charge_chem_comp_atom.pdbx_align_chem_comp_atom.pdbx_aromatic_flag_chem_comp_atom.pdbx_leaving_atom_flag_chem_comp_atom.pdbx_stereo_config_chem_comp_atom.model_Cartn_x_chem_comp_atom.model_Cartn_y_chem_comp_atom.model_Cartn_z_chem_comp_atom.pdbx_model_Cartn_x_ideal_chem_comp_atom.pdbx_model_Cartn_y_ideal_chem_comp_atom.pdbx_model_Cartn_z_ideal_chem_comp_atom.pdbx_ordinal174 CL4 CL4 CL 0 0 N N N -19.787 95.862 18.541 0.032  -0.000 -3.376 1174 C4  C4  C  0 1 Y N N -19.932 94.201 19.219 0.005  -0.000 -1.640 2174 C5  C5  C  0 1 Y N N -18.817 93.715 19.901 -1.205 0.000  -0.969 3174 C6  C6  C  0 1 Y N N -18.847 92.452 20.466 -1.233 0.000  0.409  4174 C3  C3  C  0 1 Y N N -21.099 93.428 19.089 1.196  -0.000 -0.932 5174 C2  C2  C  0 1 Y N N -21.127 92.158 19.664 1.182  0.004  0.446  6174 C1  C1  C  0 1 Y N N -19.996 91.681 20.342 -0.036 -0.000 1.128  7174 C   C   C  0 1 N N N -19.962 90.330 20.989 -0.059 -0.000 2.605  8174 O1  O1  O  0 1 N N N -20.968 89.592 20.924 1.097  -0.001 3.296  9174 O2  O2  O  0 1 N N N -18.919 89.991 21.597 -1.120 0.000  3.196  10174 H5  H5  H  0 1 N N N -17.907 94.332 19.994 -2.130 0.001  -1.526 11174 H6  H6  H  0 1 N N N -17.967 92.065 21.008 -2.178 0.000  0.931  12174 H3  H3  H  0 1 N N N -21.978 93.812 18.545 2.138  -0.001 -1.461 13174 H2  H2  H  0 1 N N N -22.035 91.537 19.583 2.110  0.003  0.997  14174 HO1 HO1 H  0 1 N N N -20.946 88.735 21.334 1.082  -0.001 4.263  15#loop__chem_comp_bond.comp_id_chem_comp_bond.atom_id_1_chem_comp_bond.atom_id_2_chem_comp_bond.value_order_chem_comp_bond.pdbx_aromatic_flag_chem_comp_bond.pdbx_stereo_config_chem_comp_bond.pdbx_ordinal174 CL4 C4  SING N N 1174 C4  C5  DOUB Y N 2174 C4  C3  SING Y N 3174 C5  C6  SING Y N 4174 C5  H5  SING N N 5174 C6  C1  DOUB Y N 6174 C6  H6  SING N N 7174 C3  C2  DOUB Y N 8174 C3  H3  SING N N 9174 C2  C1  SING Y N 10174 C2  H2  SING N N 11174 C1  C   SING N N 12174 C   O1  SING N N 13174 C   O2  DOUB N N 14174 O1  HO1 SING N N 15#loop__pdbx_chem_comp_descriptor.comp_id_pdbx_chem_comp_descriptor.type_pdbx_chem_comp_descriptor.program_pdbx_chem_comp_descriptor.program_version_pdbx_chem_comp_descriptor.descriptor174 SMILES            ACDLabs               10.04  O=C(O)c1ccc(Cl)cc1174 SMILES_CANONICAL  CACTVS                3.341  OC(=O)c1ccc(Cl)cc1174 SMILES            CACTVS                3.341  OC(=O)c1ccc(Cl)cc1174 SMILES_CANONICAL  "OpenEye OEToolkits"  1.5.0  c1cc(ccc1C(=O)O)Cl174 SMILES            "OpenEye OEToolkits"  1.5.0  c1cc(ccc1C(=O)O)Cl174 InChI             InChI                 1.02b  InChI=1/C7H5ClO2/c8-6-3-1-5(2-4-6)7(9)10/h1-4H,(H,9,10)/f/h9H174 InChIKey          InChI                 1.02b  XRHGYUZYPHTUJZ-BGGKNDAXCA#loop__pdbx_chem_comp_identifier.comp_id_pdbx_chem_comp_identifier.type_pdbx_chem_comp_identifier.program_pdbx_chem_comp_identifier.program_version_pdbx_chem_comp_identifier.identifier174 "SYSTEMATIC NAME" ACDLabs              10.04 "4-chlorobenzoic acid"174 "SYSTEMATIC NAME" "OpenEye OEToolkits" 1.5.0 "4-chlorobenzoic acid"#

Heterogen List (PDB Format)

RESIDUE   174     15CONECT      CL4    1 C4CONECT      C4     3 CL4     C5   C3CONECT      C5     3 C4      C6   H5CONECT      C6     3 C5      C1   H6CONECT      C3     3 C4      C2   H3CONECT      C2     3 C3      C1   H2CONECT      C1     3 C6      C2   CCONECT      C      3 C1      O1   O2CONECT      O1     2 C       HO1CONECT      O2     1 CCONECT      H5     1 C5CONECT      H6     1 C6CONECT      H3     1 C3CONECT      H2     1 C2CONECT      HO1    1 O1ENDHET        174             15HETNAM     174 4-CHLORO-BENZOIC    ACIDFORMUL     174    7 H5      Cl1  O2

  1. D. Dimitropoulos, J. Ionides, K. Henrick (2006) UNIT 14.3: Using MSDchem to search the PDB ligand dictionary In Current Protocols in Bioinformatics (A.D. Baxevanis, R.D.M. Page, G.A. Petsko, L.D. Stein, and G.D. Stormo, eds.) pp 14.3.1-14.3.3 John Wiley & Sons, Hoboken, NJ.
  2. Z. Feng, L. Chen, H. Maddula, O. Akcan, R. Oughtred, H.M. Berman, J. Westbrook. (2004) Ligand Depot: a data warehouse for ligands bound to macromolecules. Bioinformatics 20(13):2153-2155.


[8]ページ先頭

©2009-2026 Movatter.jp