- Notifications
You must be signed in to change notification settings - Fork1.8k
Add cif-seqres-author and cif-atom-label seq parsers#4928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
StefansM wants to merge1 commit intobiopython:masterChoose a base branch fromStefansM:bugfix/cif-seqres-author-label-chain-ids
base:master
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Draft
Add cif-seqres-author and cif-atom-label seq parsers#4928
StefansM wants to merge1 commit intobiopython:masterfromStefansM:bugfix/cif-seqres-author-label-chain-ids
Uh oh!
There was an error while loading.Please reload this page.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Seebiopython#4922.* The `CifSeqresIterator` (`cif-seqres`) uses "label" IDs ([`_pdbx_poly_seq_scheme.asym_id`][_pdbx_poly_seq_scheme.asym_id]), which are assigned by the PDB, always start at "A" and move through the alphabet one letter at a time.* The `CifAtomIterator` (`cif-atom`) uses "author" IDs, which are arbitrarily assigned by the structure authors (this is what you get in traditional fixed-width PDB files).I agree that this mismatch is probably a bug, though it would be abackwards incompatible change to fix it. If you don't care about theauthor IDs, and you just need the chain IDs of the sequence from`cif-seqres` to match the chain IDs of the atoms parsed by`MMCIFParser`, you could pass `auth_chains=False` to `MMCIFParser` touse label IDs in the structure as well.To fix this, the `CifSeqresIterator` could be modified to use author IDsby reading[`_pdbx_poly_seq_scheme.pdb_strand_id`][_pdbx_poly_seq_scheme.pdb_strand_id].Or the `CifAtomIterator` could pass `auth_chains=False` to `MMCIFParser`to use label IDs everywhere.[_pdbx_poly_seq_scheme.asym_id]:https://mmcif.wwpdb.org/dictionaries/mmcif_pdbx_v50.dic/Items/_pdbx_poly_seq_scheme.asym_id.html[_pdbx_poly_seq_scheme.pdb_strand_id]:https://mmcif.wwpdb.org/dictionaries/mmcif_ma.dic/Items/_pdbx_poly_seq_scheme.pdb_strand_id.html
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft pull request to illustrate#4922.
The
CifSeqresIterator(cif-seqres) uses "label" IDs (_pdbx_poly_seq_scheme.asym_id), which are assigned by the PDB, always start at "A" and move through the alphabet one letter at a time.The
CifAtomIterator(cif-atom) uses "author" IDs, which are arbitrarily assigned by the structure authors (this is what you get in traditional fixed-width PDB files).I agree that this mismatch is probably a bug, though it would be a backwards incompatible change to fix it. If you don't care about the author IDs, and you just need the chain IDs of the sequence from
cif-seqresto match the chain IDs of the atoms parsed byMMCIFParser, you could passauth_chains=FalsetoMMCIFParserto use label IDs in the structure as well.To fix this, the
CifSeqresIteratorcould be modified to use author IDs by reading_pdbx_poly_seq_scheme.pdb_strand_id. Or theCifAtomIteratorcould passauth_chains=FalsetoMMCIFParserto use label IDs everywhere.I hereby agree to dual licence this and any previous contributions under both
theBiopython License AgreementAND theBSD 3-Clause License.
I have read the
CONTRIBUTING.rstfile, have runpre-commitlocally, and understand that continuous integration checks will be used to
confirm the Biopython unit tests and style checks pass with these changes.
I have added my name to the alphabetical contributors listings in the files
NEWS.rstandCONTRIB.rstas part of this pull request, am listedalready, or do not wish to be listed. (This acknowledgement is optional.)
Closes #...