- Notifications
You must be signed in to change notification settings - Fork31
GA4GH Variation Representation Python Implementation
License
ga4gh/vrs-python
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
VRS-Python provides Python language support and a reference implementation for theGA4GH Variation Representation Specification(VRS).
- Pydantic implementation of GKS core models and VRS models
- Algorithm for generating consistent, globally unique identifiers for variation without a central authority
- Algorithm for performing fully justified allele normalization
- Translating from and to other variant formats
- Annotate VCFs with VRS
- Convert GA4GH objects between inlined and referenced forms
You are encouraged tobrowse issues.All known issues are listed there. Please report any issues you find.
- Python >= 3.10
- Note: Python 3.12 is required for developers contributing to VRS-Python. TheMakefile sets up a virtual environment in
venv/3.12
and expects Python to beavailable aspython3.12
.
- Note: Python 3.12 is required for developers contributing to VRS-Python. TheMakefile sets up a virtual environment in
- libpq
- postgresql
You can use Homebrew to install the prerequisites. See theHomebrew documentation for how to install. Makesure Homebrew is up-to-date by runningbrew update
.
brew install libpqbrew install python3brew install postgresql@14
sudo apt install gcc libpq-dev python3-dev
VRS-Python is available onPyPI.
pip install'ga4gh.vrs[extras]'
The[extras]
argument tells pip to install packages to fulfill the dependencies of thega4gh.vrs.extras
package.
Thega4gh.vrs.extras
modules are not part of the VR spec per se. They arebundled with ga4gh.vrs for development and installation convenience. Thesemodules depend directly and indirectly on external data sources of sequences,transcripts, and genome-transcript alignments.
First, you must install a localSeqRepo:
pip install seqrepoexport SEQREPO_VERSION=2024-12-20# or newer if available -- check `seqrepo list-remote-instances`sudo mkdir -p /usr/local/share/seqreposudo chown$USER /usr/local/share/seqreposeqrepo pull -i$SEQREPO_VERSIONseqrepo update-latest
If you encounter a permission error similar to the one below:
PermissionError: [Error 13] Permission denied:'/usr/local/share/seqrepo/2024-12-20._fkuefgd' ->'/usr/local/share/seqrepo/2024-12-20'
Try moving data manually withsudo
:
sudo mv /usr/local/share/seqrepo/$SEQREPO_VERSION.* /usr/local/share/seqrepo/$SEQREPO_VERSION
To make installation easy, we recommend using Docker to install the other Biocommonstools -SeqRepo REST andUTA. If you would like to use local instances of UTA,seeUTA directly. We do provide some additionalsetup helphere.
Next, run the following commands:
docker volume create --name=uta_voldocker volume create --name=seqrepo_voldocker-compose up
This should start three containers:
- seqrepo: downloads seqrepo into adocker volume and exits
- seqrepo-rest-service: aREST service on seqrepo (localhost:5000)
- uta: a database of transcripts andalignments (localhost:5432)
Check that the seqrepo-rest-service and uta containers are running, by running:
$ docker psCONTAINER ID IMAGE // NAMES86e872ab0c69 biocommons/seqrepo-rest-service:latest // vrs-python_seqrepo-rest-service_1a40576b8cf1f biocommons/uta:uta_20241220 // vrs-python_uta_1
Depending on your network and host, thefirst run is likely to take 5-15minutes in order to download and install data. Subsequent startups should benearly instantaneous.
You can test UTA and seqrepo installations like so:
$ psql -XAt postgres://anonymous@localhost/uta -c'select count(*) from uta_20241220.transcript'314227curl'http://127.0.0.1:5000/seqrepo/1/sequence/refseq:NM_000059.4?end=20'AGAGGCGGAGCCGCTGTGGC
Here are some things to try.
Bring up one service at a time. For example, if you haven't download seqrepoyet, you might see this:
$ docker-compose up seqrepo-rest-serviceStarting vrs-python_seqrepo-rest-service_1 ...doneAttaching to vrs-python_seqrepo-rest-service_1seqrepo-rest-service_1| 2022-07-26 15:59:59 seqrepo_rest_service.__main__[1] INFO Using seqrepo_dir='/usr/local/share/seqrepo/2024-12-20' fromcommand line⋮seqrepo-rest-service_1| OSError: Unable to open SeqRepo directory /usr/local/share/seqrepo/2024-12-20vrs-python_seqrepo-rest-service_1 exited with code 1
If you are having issues with UTA: if your machine is already runningpostgresql on port 5432 (which is thedefault on many systems), you may see an error message such as this:
$ psql -XAt postgres://anonymous@localhost/uta -c'select count(*) from uta_20241220.transcript'psql: error: connection to server at"localhost" (::1), port 5432 failed: FATAL: role"anonymous" does not exist
You can move your UTA installation to a different port as follows:
- Select a new port number for UTA, and verify that the port is available.For example, if you have sudo privileges on your machine, you can verifythe port is available with the
lsof
command:If the port is available, the output of this command should be 0 lines long.sudo lsof -i :[port_number]
- Edit your docker-compose.yml file. In the linesreplace thefirst number with a different number to specify a port onyour local machine.
ports: - 5432:5432
ports: - [your_port_number]:5432
- Repeat the
docker-compose up
command - Repeat the command above to verify that there is now a docker commandlistening at this port.This time, you should see that a docker command is using the port.
sudo lsof -i :[your_port_number]
- Specify the new port in your psql command:
$ psql -XAt postgres://anonymous@localhost/uta -p [your_port_number] -c'select count(*) from uta_20241220.transcript'
- Set the
UTA_DB_URL
environment variable to specify your port.export UTA_DB_URL="postgresql://anonymous@localhost:[your_port_number]/uta/uta_20241220"
- Select a new port number for UTA, and verify that the port is available.For example, if you have sudo privileges on your machine, you can verifythe port is available with the
If you are having issues with SeqRepo, check to see if there is anotherprocess using port 5000, and try moving to a different port:
- Follow the instructions above to see if port 5000 is already in use.
- If it is, edit your docker-compose.yml file to specify a different port.In the linesreplace thefirst number with a different number to specify a port onyour local machine.
ports: - 5000:5000
ports: - [your_port_number]:5000
- Repeat the
docker-compose up
command - Test the SeqRepo REST API service with this new port
curl'http://127.0.0.1:[your_port_number]/seqrepo/1/sequence/refseq:NM_000059.4?end=20'
- Set the
GA4GH_VRS_DATAPROXY_URI
environment variable to point tothis UL:$export GA4GH_VRS_DATAPROXY_URI=http://localhost:[your_port_number]/seqrepo$export SEQREPO_URI=http://localhost:[your_port_number]
The ga4gh/vrs-python repo embeds the ga4gh/vrs repo as a git submodule for testing purposes.Each ga4gh.vrs package on PyPI embeds a particular version of VRS. Thecorrespondences between the packages that arecurrently maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
main(default branch) | 2.x | 2.x | 2.x |
1.x | 0.8.x | 1.x | 1.x |
⚠Note: Only 2.x branch is being actively maintained. The 1.x branch will only be maintained for bug fixes.
⚠Developers: See the development section below for recommendations for using submodulesgracefully (and without causing problems for others!).
The correspondences between the packages that areno longer maintained may be summarized as:
vrs-python branch | vrs-python tag/version | vrs branch | vrs version |
---|---|---|---|
0.9 | 0.9.x | metaschema-update | N/A |
0.7 | 0.7.x | 1.2 | 1.2.x |
0.6 | 0.6.x | 1.1 | 1.1.x |
This section is intended for developers who contribute to VRS-Python.
Then, clone your fork and initialize a development environment:
git clone --recurse-submodules git@github.com:YOUR_GITHUB_ID/vrs-python.gitcd vrs-pythonmake devreadysource venv/3.12/bin/activate
This setup includespre-commit hooks. If you create a virtual environment manually, be sure to install the hooks yourself; otherwise, commits may fail duringCI/CD checks:
source venv/3.12/bin/activatepre-commit install
If you already cloned the repo, but forgot to include--recurse-submodules
you can run:
git submodule update --init --recursive
vrs-python embeds vrs as a submodule, only for testing purposes. When checking out vrs-python and switchingbranches, it is important to make sure that the submodule tracks vrs-pythoncorrectly. The recommended way to do this isgit config --global submodule.recurse true
.If you don't set submodule.recurse, developers andreviewers must be extremely careful to not accidentally upgrade or downgradeschemas with respect to vrs-python.
Alternatively, seemisc/githooks/
.
This package implements typical unit tests for ga4gh.core and ga4gh.vrs. Thispackage also implements the compliance tests from vrs (vrs/validation) in thetests/validation/ directory.
To run tests:
maketest
The notebooksdo not require you to setup SeqRepo or UTA fromInstall External Data Sources.
Binder allows you to create custom computing environments that can be shared and used by many remote users.
You can access the notebooks on Binderhere.
Terra is a cloud platform for biomedical research developed by the Broad Institute, Microsoft and Verily. The platform includes preconfigured environments that provide user-friendly access to various applications commonly used in bioinformatics, including Jupyter Notebooks.
We have created a publicVRS-demo-notebooks
workspace in Terra that contains the demo notebooks along with instructions for running them with minimal setup. To get started, see either theVRS-demo-notebooks
workspace or theTerra.ipynb
notebook in this repository.
VS Code is a code editor developed by Microsoft. It is lightweight, highly customizable, and supports a wide range of programming languages, with a robust extension system. You can download VS Codehere.
- Open VS Code.
- Use Extensions view (Ctrl+Shift+X or ⌘+Shift+X) to install theJupyter extension.
- Navigate to your vrs-python project folder and open it in VS Code.
- In a notebook, click
Select Kernel
at the top right. Select the option where the path isvenv/3.12/bin/python3
. Seehere for more information on managing Jupyter Kernels in VS Code. - After selecting the kernel you can now run the notebook.
A stand-alone security review has been performed on the specification itself.This implementation is offered as-is, and without any security guarantees. Itwill need an independent security review before it can be considered ready foruse in security-critical applications. If you integrate this code into yourapplication it is AT YOUR OWN RISK AND RESPONSIBILITY to arrange for a securityaudit.
About
GA4GH Variation Representation Python Implementation
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Uh oh!
There was an error while loading.Please reload this page.