- Notifications
You must be signed in to change notification settings - Fork32
PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
License
callahantiff/PheKnowLator
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
PheKnowLator (Phenotype Knowledge Translator) orpkt_kg
is the first fully customizable knowledge graph (KG) construction framework enabling users to build complex KGs that are Semantic Web compliant and amenable to automatic Web Ontology Language (OWL) reasoning, generate contemporary property graphs, and are importable by today’s popular graph toolkits. Please see the projectWiki for additional information.
📢 Please see our preprint 👉https://arxiv.org/abs/2307.05727
- A Knowledge Graph Sharing Hub: Prebuilt KGs and associated metadata. Each KG is provided as triple edge lists, OWL API-formatted
RDF/XML
and NetworkX graph-pickled MultiDiGraphs. We also make text files available containing node and relation metadata. - A Knowledge Graph Building Framework: An automated
Python 3
library designed for optimized construction of semantically-rich, large-scale biomedical KGs from complex heterogeneous data. The framework also includes Jupyter Notebooks to greatly simplify the generation of required input dependencies.
NOTE. A table listing and describing all output files generated for each build along with example output from eachfile can be foundhere.
- Join and/or start aDiscussion
- The ProjectWiki for available knowledge graphs, pkt_kg data sources, and the knowledge graph construction process
- AZenodo Community has been established to provide access to software releases, presentations, and preprints related to this project
This program requires Python version 3.6. To install the library fromPyPI, run:
pip install pkt_kg
You can also clone the repository directly from GitHub by running:
git clone https://github.com/callahantiff/PheKnowLator.git
Note. SometimesOWLTools
, which comes with the cloned/forked repository (./pkt_kg/libs/owltools
) loses "executable" permission. To avoid any potential issues, I recommend running the following in the terminal from the PheKnowLator directory:
chmod +x pkt_kg/libs/owltools
Thepkt_kg
library requires a specific project directory structure.
- If you plan to run the code from a cloned version of this repository, then no additional steps are needed.
- If you are planning to utilize the library without cloning the library, please make sure that your project directory matches the following:
PheKnowLator/||---- resources/||| construction_approach/||| edge_data/||| knowledge_graphs/||| node_data/||| ontologies/||| owl_decoding/||| relations_data/
Several input documents must be created before thepkt_kg
library can be utilized. Each of the input documents are listed below by knowledge graph build step:
This code requires three documents within theresources
directory to run successfully. For more information on these documents, seeDocument Dependencies:
For assistance in creating these documents, please run the following from the root directory:
python3 generates_dependency_documents.py
Prior to running this step, make sure that all mapping and filtering data referenced inresources/resource_info.txt have been created. To generate these data yourself, please see theData_Preparation.ipynb Jupyter Notebook for detailed examples of the steps used to build thev2.0.0 knowledge graph.
Note. To ensure reproducibility, after downloading data, a metadata file is output for the ontologies (ontology_source_metadata.txt) and edge data sources (edge_source_metadata.txt).
TheKG Construction Wiki page provides a detailed description of the knowledge construction process (please see the knowledge graphREADME for more information). Please make sure the documents listed below are presented in the specified location prior to constructing a knowledge graph. Click on each document for additional information. Note, that cloning this library will include a version of these documents that points to the current build. If you use this version then there is no need to download anything prior to running the program.
- resources/construction_approach/subclass_construction_map.pkl
- resources/Master_Edge_List_Dict.json ➞automatically created after edge list construction
- resources/node_data/node_metadata_dict.pkl ➞if adding metadata for new edges to the knowledge graph
- resources/knowledge_graphs/PheKnowLator_MergedOntologies*.owl ➞seeontology READMEfor information
- resources/relations_data/RELATIONS_LABELS.txt
- resources/relations_data/INVERSE_RELATIONS.txt ➞if including inverse relations
pkt_kg
can be run via the providedmain.py script or using themain.ipynb Jupyter Notebook or using a Docker container.
The program can be run locally using themain.py script or using themain.ipynb Jupyter Notebook. An example of the workflow used in both of these approaches is shown below.
importpsutilimportrayfrompktimportdownloads,edge_list,knowledge_graph# initialize rayray.init()# determine number of cpus availableavailable_cpus=psutil.cpu_count(logical=False)# DOWNLOAD DATA# ontology dataont=pkt.OntData('resources/ontology_source_list.txt')ont.downloads_data_from_url()ont.writes_source_metadata_locally()# edge data sourcesedges=pkt.LinkedData('resources/edge_source_list.txt')edges.downloads_data_from_url()edges.writes_source_metadata_locally()# CREATE MASTER EDGE LISTcombined_edges=dict(edges.data_files,**ont.data_files)# initialize edge dictionary classmaster_edges=pkt.CreatesEdgeList(data_files=combined_edges,source_file='./resources/resource_info.txt')master_edges.runs_creates_knowledge_graph_edges(source_file'./resources/resource_info.txt',data_files=combined_edges,cpus=available_cpus)# BUILD KNOWLEDGE GRAPH# full build, subclass construction approach, with inverse relations and node metadata, and decode owlkg=PartialBuild(kg_version='v2.0.0',write_location='./resources/knowledge_graphs',construction='subclass,node_data='yes,inverse_relations='yes',cpus=available_cpus,decode_owl='yes')kg.construct_knowledge_graph()ray.shutdown()
The example below provides the details needed to runpkt_kg
using./main.py
.
python3 main.py -husage: main.py [-h] [-p CPUS] -g ONTS -e EDG -a APP -t RES -b KG -o OUT -n NDE -r REL -s OWL -m KGMPheKnowLator: This program builds a biomedical knowledge graph using Open Biomedical Ontologiesand linked open data. The program takes the following arguments:optional arguments:-h, --help show thishelp message andexit-p CPUS, --cpus CPUS# workers to use; defaults to use all available cores-g ONTS, --onts ONTS name/path to text file containing ontologies-e EDG, --edg EDG name/path to text file containing edge sources-a APP, --app APP construction approach to use (i.e. instance or subclass-t RES, --res RES name/path to text file containing resource_info-b KG, --kg KG the build, can be"partial","full", or"post-closure"-o OUT, --out OUT name/path to directory where to write knowledge graph-r REL, --rel REL yes/no - adding inverse relations to knowledge graph-s OWL, --owl OWL yes/no - removing OWL Semantics from knowledge graph
The./main.ipynb
Jupyter notebook provides detailed instructions for how to run thepkt_kg
algorithm and build a knowledge graph from scratch.
pkt_kg
can be run using a Docker instance. In order to utilize the Dockerized version of the code, please make sure that you have downloaded the newest version ofDocker. There are two ways to utilize Docker with this repository:
- Obtain Pre-Built Container fromDockerHub
- Build the Container (see details below)
Obtain Pre-Built Containiner: A pre-built containers can be obtained directly fromDockerHub.
Build Container: To build thepkt_kg
download a stable release of this repository (or fork/clone it repository). Once downloaded, you will have everything needed to build the container, including the./Dockerfile
and./dockerignore
. The code shown below builds the container. Make sure to replace[VERSION]
with the currentpkt_kg
version before running the code.
cd /path/to/PheKnowLator (Note, this is the directory containing the Dockerfile file)docker build -t pkt:[VERSION].
- Update
PheKnowLator/resources/resource_info.txt
,PheKnowLator/resources/edge_source_list.txt
, andPheKnowLator/resources/ontology_source_list.txt
- Building the container "as-is" off of DockerHub will include a download of the data used in the latest releases. No need to update any scripts or pre-download any data.
The following code can be used to runpkt_kg
from outside of the container (after obtaining a prebuilt container or after building the container locally). In:
docker run --name [DOCKER CONTAINER NAME] -it pkt:[VERSION] --app subclass --kg full --nde yes --rel yes --owl no --kgm yes
- The example shown above builds a full version of the knowledge graph using the subclass construction approach with node metadata, inverse relations, and decoding of OWL classes. See theRunning the pkt Library section for more information on the parameters that can be passed to
pkt_kg
- The Docker container cannot write to an encrypted filesystem, however, so please make sure
/local/path/to/PheKnowLator/resources/knowledge_graphs
references a directory that is not encrypted
In order to enable persistent data, a volume is mounted within theDockerfile
. By default, Docker names volumes using a hash. In order to find the correctly mounted volume, you can run the following:
Command 1: Obtains the volume hash:
docker inspect --format='{{json .Mounts}}' [DOCKER CONTAINER NAME]| python -m json.tool
Command 2: View data written to the volume:
sudo ls /var/lib/docker/volumes/[VOLUME HASH]/_data
Please readCONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We’d love to hear from you! To get in touch with us, please join or start a newDiscussion,create an issueorsend us an email 💌
This project is licensed under Apache License 2.0 - see theLICENSE.md file for details.
Please see our preprint:https://arxiv.org/abs/2307.05727
About
PheKnowLator: Heterogeneous Biomedical Knowledge Graphs and Benchmarks Constructed Under Alternative Semantic Models
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.