- Notifications
You must be signed in to change notification settings - Fork30
GA4GH Variation Representation Python Implementation
License
ga4gh/vrs-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
VRS-Python provides Python language support and a reference implementation for theGA4GH Variation Representation Specification(VRS).
- Pydantic implementation of GKS core models and VRS models
- Algorithm for generating consistent, globally unique identifiers for variation without a central authority
- Algorithm for performing fully justified allele normalization
- Translating from and to other variant formats
- Annotate VCFs with VRS
- Convert GA4GH objects between inlined and referenced forms
You are encouraged tobrowse issues.All known issues are listed there. Please report any issues you find.
- Python >= 3.10
- Note: Python 3.12 is required for developers contributing to VRS-Python. TheMakefile sets up a virtual environment in
venv/3.12
and expects Python to beavailable aspython3.12
.
- Note: Python 3.12 is required for developers contributing to VRS-Python. TheMakefile sets up a virtual environment in
- libpq
- postgresql
You can use Homebrew to install the prerequisites. See theHomebrew documentation for how to install. Makesure Homebrew is up-to-date by runningbrew update
.
brew install libpqbrew install python3brew install postgresql@14
sudo apt install gcc libpq-dev python3-dev
VRS-Python is available onPyPI.
pip install'ga4gh.vrs[extras]'
The[extras]
argument tells pip to install packages to fulfill the dependencies of thega4gh.vrs.extras
package.
Thega4gh.vrs.extras
modules are not part of the VR spec per se. They arebundled with ga4gh.vrs for development and installation convenience. Thesemodules depend directly and indirectly on external data sources of sequences,transcripts, and genome-transcript alignments.
First, you must install a localSeqRepo:
pip install seqrepoexport SEQREPO_VERSION=2024-12-20# or newer if available -- check `seqrepo list-remote-instances`sudo mkdir -p /usr/local/share/seqreposudo chown$USER /usr/local/share/seqreposeqrepo pull -i$SEQREPO_VERSIONseqrepo update-latest
If you encounter a permission error similar to the one below:
PermissionError: [Error 13] Permission denied:'/usr/local/share/seqrepo/2024-12-20._fkuefgd' ->'/usr/local/share/seqrepo/2024-12-20'
Try moving data manually withsudo
:
sudo mv /usr/local/share/seqrepo/$SEQREPO_VERSION.* /usr/local/share/seqrepo/$SEQREPO_VERSION
To make installation easy, we recommend using Docker to install the other Biocommonstools -SeqRepo REST andUTA. If you would like to use local instances of UTA,seeUTA directly. We do provide some additionalsetup helphere.
Next, run the following commands:
docker volume create --name=uta_voldocker volume create --name=seqrepo_voldocker-compose up
This should start three containers:
- seqrepo: downloads seqrepo into adocker volume and exits
- seqrepo-rest-service: aREST service on seqrepo (localhost:5000)
- uta: a database of transcripts andalignments (localhost:5432)
Check that the containers are running, by running:
$ docker psCONTAINER ID IMAGE // NAMES86e872ab0c69 biocommons/seqrepo-rest-service:latest // vrs-python_seqrepo-rest-service_1a40576b8cf1f biocommons/uta:uta_20241220 // vrs-python_uta_1
Depending on your network and host, thefirst run is likely to take 5-15minutes in order to download and install data. Subsequent startups should benearly instantaneous.
You can test UTA and seqrepo installations like so:
$ psql -XAt postgres://anonymous@localhost/uta -c'select count(*) from uta_20241220.transcript'314227
Here are some things to try.
Bring up one service at a time. For example, if you haven't download seqrepoyet, you might see this:
$ docker-compose up seqrepo-rest-serviceStarting vrs-python_seqrepo-rest-service_1 ...doneAttaching to vrs-python_seqrepo-rest-service_1seqrepo-rest-service_1| 2022-07-26 15:59:59 seqrepo_rest_service.__main__[1] INFO Using seqrepo_dir='/usr/local/share/seqrepo/2024-12-20' fromcommand line⋮seqrepo-rest-service_1| OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/2024-12-20vrs-python_seqrepo-rest-service_1 exited with code 1
The ga4gh/vrs-python repo embeds the ga4gh/vrs repo as a git submodule for testing purposes.Each ga4gh.vrs package on PyPI embeds a particular version of VRS. Thecorrespondences between the packages that arecurrently maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
main(default branch) | 2.x | 2.x | 2.x |
1.x | 0.8.x | 1.x | 1.x |
⚠Note: Only 2.x branch is being actively maintained. The 1.x branch will only be maintained for bug fixes.
⚠Developers: See the development section below for recommendations for using submodulesgracefully (and without causing problems for others!).
The correspondences between the packages that areno longer maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
0.9 | 0.9.x | metaschema-update | N/A |
0.7 | 0.7.x | 1.2 | 1.2.x |
0.6 | 0.6.x | 1.1 | 1.1.x |
This section is intended for developers who contribute to VRS-Python.
Then, clone your fork and initialize a development environment:
git clone --recurse-submodules git@github.com:YOUR_GITHUB_ID/vrs-python.gitcd vrs-pythonmake devreadysource venv/3.12/bin/activate
This setup includespre-commit hooks. If you create a virtual environment manually, be sure to install the hooks yourself; otherwise, commits may fail duringCI/CD checks:
source venv/3.12/bin/activatepre-commit install
If you already cloned the repo, but forgot to include--recurse-submodules
you can run:
git submodule update --init --recursive
vrs-python embeds vrs as a submodule, only for testing purposes. When checking out vrs-python and switchingbranches, it is important to make sure that the submodule tracks vrs-pythoncorrectly. The recommended way to do this isgit config --global submodule.recurse true
.If you don't set submodule.recurse, developers andreviewers must be extremely careful to not accidentally upgrade or downgradeschemas with respect to vrs-python.
Alternatively, seemisc/githooks/
.
This package implements typical unit tests for ga4gh.core and ga4gh.vrs. Thispackage also implements the compliance tests from vrs (vrs/validation) in thetests/validation/ directory.
To run tests:
maketest
The notebooksdo not require you to setup SeqRepo or UTA fromInstall External Data Sources.
Binder allows you to create custom computing environments that can be shared and used by many remote users.
You can access the notebooks on Binderhere.
Terra is a cloud platform for biomedical research developed by the Broad Institute, Microsoft and Verily. The platform includes preconfigured environments that provide user-friendly access to various applications commonly used in bioinformatics, including Jupyter Notebooks.
We have created a publicVRS-demo-notebooks
workspace in Terra that contains the demo notebooks along with instructions for running them with minimal setup. To get started, see either theVRS-demo-notebooks
workspace or theTerra.ipynb
notebook in this repository.
VS Code is a code editor developed by Microsoft. It is lightweight, highly customizable, and supports a wide range of programming languages, with a robust extension system. You can download VS Codehere.
- Open VS Code.
- Use Extensions view (Ctrl+Shift+X or ⌘+Shift+X) to install theJupyter extension.
- Navigate to your vrs-python project folder and open it in VS Code.
- In a notebook, click
Select Kernel
at the top right. Select the option where the path isvenv/3.12/bin/python3
. Seehere for more information on managing Jupyter Kernels in VS Code. - After selecting the kernel you can now run the notebook.
A stand-alone security review has been performed on the specification itself.This implementation is offered as-is, and without any security guarantees. Itwill need an independent security review before it can be considered ready foruse in security-critical applications. If you integrate this code into yourapplication it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for a securityaudit.
About
GA4GH Variation Representation Python Implementation