RCSB PDB: Sequence Coordinates Server API Documentation
The RCSB PDB Sequence Coordinates Server compiles alignments between structural and sequence databases and integrates protein positional features from multiple resources. Alignment data is available for NCBIRefSeq (including protein and genomic sequences),UniProt andPDB sequences. Protein positional features are integrated fromUniProt,CATH,SCOPe andRCSB PDB and collected from theRCSB PDB Data Warehouse. The server offers a GraphQL-based application programming interface (API) to access the integrated content.
GraphQL server operates on a single URL/endpoint,https://sequence-coordinates.rcsb.org/graphql, and all GraphQL requests for this service should be directed at this endpoint. GraphQL HTTP server handles POST method.
Requests must use HTTP POST with "application/json" as content type and GraphQL request details included as JSON in the request body, as defined in the proposedGraphQL over HTTP specification.
In the example above, the query arguments are written inside the query string. The query arguments can also be passed as dynamic values that are calledvariables. The variable definition looks like ($id: String!) in the example below. It lists a variable, prefixed by$, followed by its type, in this case String (! indicates that a non-null argument is required).
The following is equivalent to the previous query:
Where:
With variable defined like so:
Query variables, should be sent as part of the POST request in an additional parameter calledvariables.
A valid GraphQL POST request should use theapplication/json content type, must includequery, and may includevariables encoded as a JSON document in the request body. Here's an example for a valid body of a POST request:
Regardless of the method by which the query and variables were sent, the response is returned in JSON format. A query might result in some data and some errors. The successful response will be returned in the form of:
Error handling in REST is pretty straightforward, we simply check the HTTP headers to get the status of a response. Depending on the HTTP status code we get (200 or404), we can easily tell what the error is and how to go about resolving it. GraphQL server, on the other hand, will always respond with a200 OK status code. When an error occurs while processing GraphQL queries, the complete error message is sent to the client with the response. Below is a sample of a typical GraphQL error message when requesting a field that is not defined in theGraphQL schema:
GraphQL enables declarative data fetching and gives power to request exactly the data that is needed. The GraphQL end point defines two different queries for sequence alignments and positional features:
alignmentannotationsalignment(from: SequenceReference!, to: SequenceReference!, queryId: String!, range:[Int!])from andto parameters codify the origin and target sequence databases, respectively, through a set of enumerated values Next table describes the type of database identifiers used for eachSequenceReference value
SequenceReference | Database Identifier | Example |
|---|---|---|
NCBI_GENOME | NCBI RefSeq Chromosome Accession | NC_000001 |
NCBI_PROTEIN | NCBI RefSeq Protein Accession | NP_789765 |
UNIPROT | UniProt Accession | P01112 |
PDB_ENTITY | RCSB PDB Entity Id / CSM Entity Id | 2UZI_3 / AF_AFP68871F1_1 |
PDB_INSTANCE | RCSB PDB Instance Id / CSM Instance Id | 2UZI.C / AF_AFP68871F1.A |
queryId is a valid identifier in the sequence database defined byfromrange is an optional integer list (2-tuple) to filter the alignment to a particular regionannotations(reference: SequenceReference!, queryId: String!, sources: [Source!]!, range:[Int!], filters:[FilterInput!])reference andqueryId indicate the sequence over which annotations will be mappedreference is a defined by the same enumerated values defined in thealignmentqueryId parameterqueryId parameter is a valid identifier of thereference database for whom the annotations will be requestedsources array is an enumerated list defining the annotation collections to be requestedrange is an optional integer list (2-tuple) to filter annotations that fall in a particular regionfilters is an optional array ofFilterInput that can be used to select what annotations will be retrievedoperation is an enumerated value (OperationType = contains|equals) that defines the comparison methodvalues list of allowed valuessource only features with the sameSource will be filteredSchemas used to encode sequence alignments and positional features are extensions of the data schemas used in theRCSB PDB Data API. The following definitions and structures are relevant to the way that alignments and annotations are encoded:
AlignmentResponse is the root document used to encode alignmentsquery_sequence contains the sequence of the database entry defined by defined byfrom andqueryId parameters (ref). This field isnull when genome scale alignments are requested (i.e.from value isNCBI_GENOME)target_alignment is a list ofTargetAlignment documents that describes the different alignments between the sequence identified by thefrom andqueryId parameters (ref) and the database defined bytoTargetAlignment is the document structure that describes a sequence alignment between the database entry defined byfrom andqueryId parameters (ref) and the entry defined byto andtarget_id (see next set of bullet points)target_id identifies the entry from the database defined by the parameterto that is being aligned with the query (defined byfrom andqueryId parametersref)target_sequence contains the sequence of the database entry defined by defined byto andtarget_idaligned_regions is a list ofAlignedRegion documents that defines the sequence alignment through a collection of regionscoverage document object that contains different scores related to the sequence alignment (seeCoverage)orientation integer that identifies the DNA strand of genome alignments (1 positive strand / -1 negative strand)AlignedRegion sequence alignments are defined by a list of regions that identify the beginning and end positions in the query and target sequences. When alignment data maps residues between protein sequences indexes are aligned one to one from the starting to ending position incrementally (see nextFigure). When alignments involve genome sequences 3 consecutive nucleotide indexes are paired with a protein residue with the possible addition of 1 or 2 nucleotide indexes stored in a separte arrayexon_shift to complete the final nucleotide triad (seeFigure).
query_begin andquery_end identify the start and end positions of the alignment in the query sequence (defined byfrom andqueryId parametersref)target_begin andtarget_end identify the start and end positions of the alignment in the target sequence (defined byto andtarget_id parameters)exon_shift list of genomic indexes that are needed to complete the last nucleotide triad of a genome-protein sequence alignment (see nextFigure)
exon_shift. In this example this situation occurs in the firstAlignedRegion where PDB Entity residue index 7 is mapped to genome nucleotide indexes [8,13,14].Coverage object that contains different scores related to the sequence alignmentsquery_coverage andquery_length contain the percentage of the query sequence that has been aligned and its length (the query sequence is defined byfrom andqueryId parametersref)target_coverage andtarget_length contain the percentage of the target sequence that has been aligned and its length (the target sequence is defined by byto andtarget_id parameters)[AnnotationFeatures] is the root list of objects that contains the requested annotationsFeature list of documents that desribe positional featuressource enumerated value that identifies the provenance type of the positional features (ref)target_id source entry identifier associated to the positional featuresFeature document that describes a positional featurefeature_id Identifier of the feature. When available the same Id as in theprovenance_source is useddescription Free-form text describing the featuretype Feature category identifier (seeFeature Type controlled vocabulary)feature_positions List ofFeaturePosition documents that describes the location of the featureprovenance_source Original database or software name used to obtain the featurename Name associated to the feature (e.g. protein domain name)value Numerical value associated to the featureFeaturePosition document that describes a segment where a feature occursbeg_seq_id Index at which this segment of the feature beginsend_seq_id Index at which this segment of the feature ends. If the positional feature maps to a single residue this field will benullbeg_ori_id Index at which this segment of the feature begins on the originalprovenance_source. Whenreference andsource point to the same reference system this file will benullend_ori_id Index at which this segment of the feature ends on the originalprovenance_source. If the positional feature maps to a single residue this field will benull. Whenreference andsource point to the same reference system this file will benullvalue A numerical value of the feature for this segmentAll GraphQL queries are validated and executed against the GraphQL schema. The GraphQL schema contains the elements that definesequence alignments and positional features.
You can useGraphiQL, which is a "graphical interactive in-browser GraphQL IDE", to explore GraphQL schema. It lets you try different queries, helps with auto completion and built-in validation. The collapsibleDocs panel (Documentation Explorer) on the right side of the page allows you to navigate through the schema definitions. Click on the root Query link to start exploring the GraphQL schema.

This section contains additional examples for using the GraphQL-based RCSB PDB Sequence Coordinates Server API.
Fetch alignments between a UniProt Accession and PDB Entities:
Fetch alignments between a Computed Structure Model and NCBI proteins:
Fetch all positional features for a particular PDB Instance:
Map all PDB Entities that fall in Human Chromosome 1:
Fetch protein-ligand binding sites for PDB Instances that fall within Human Chromosome 1:
Note, thatlabel_asym_id is used to identify polymer entity instances.
Fetch alignments between a PDB Instance and NCB RefSeq proteins:
The following guide will help you migrate from the 1D Coordinates Service API to the Sequence Coordinates Service. Thispage describes the changes between both APIs.
Sequence Coordinates Server usage is available under the same terms and condition as RCSB PDB (seeusage policies)
To cite this service, please reference:
Contactinfo@rcsb.org with questions or feedback about this service.