- Notifications
You must be signed in to change notification settings - Fork0
Estimate similarity of medical concepts based on Unified Medical Language System (UMLS)
License
dhchenx/umls-similarity
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Estimate the similarity of medical concepts based on Unified Medical Language System (UMLS) and WordNet
First of all, please install Perl environment (Strawberry).
Install MySQL and MySQL Workbench and the MySQL Home folder should not have space in its path;
Download the UMLS and extract the subset;
Goto UMLS's META and NET folders and Load UMLS data into MySQL database withscripts;
Install necessary libs with 'cpanm' command with the flag
--force
like below:cpanm UMLS::Interface --forcecpanm UMLS::Similarity --force
Errors may occur in the above process, just ignore them.
Please check if you have installed
DBI
,DBD::mysql
; install them if not;Issue: mysql.xs.dll not found problem, please found more details inlink.
Solution: Copying C:\strawberry\c\bin\libmysql.dll_ to c:\strawberry\perl\vendor\lib\auto\mysql
Finished!
- Download theWordNet-2.1 if you want to use WordNet Similarity (if not, please skip)
- Set WNHome environment variables (if you need to use WordNet Similarity)
- Install
WordNet::QueryData
viacpanm
command in perl - Install
WordNet::Similarity
viacpanm
command in perl - Finished!
pip install umls-similarity
- Leacock and Chodorow (1998) referred to as lch
- Wu and Palmer (1994) referred to as wup
- Zhong, et al. (2002) referred to as zhong
- The basic path measure referred to as path
- The undirected path measure referred to as upath
- Rada, et. al. (1989) referred to as cdist
- Nguyan and Al-Mubaid (2006) referred to as nam
- Resnik (1996) referred to as res
- Lin (1988) referred to as lin
- Jiang and Conrath (1997) referred to as jcn
- The vector measure referred to as vector
- Pekar and Staab (2002) referred to as pks
- Pirro and Euzenat (2010) referred to as faith
- Maedche and Staab (2001) referred to as cmatch
- Batet, et al (2011) referred to as batet
- Sanchez, et al. (2012) referred to as sanchez
Example Code 1: Estimate similarity between two medical concepts using UMLS
fromumls_similarity.umlsimportUMLSSimilarityimportosif__name__=="__main__":# define MySQL information that stores UMLS data in your computermysql_info= {}mysql_info["database"]="umls"mysql_info["username"]="root"mysql_info["password"]="{I am not gonna tell you}"mysql_info["hostname"]="localhost"# Perl bin's path which will be automatically detected by the lib, but you can also manually specify in its constructor# perl_bin_path = r"C:\Strawberry\perl\bin\perl"# create an instanceumls_sim=UMLSSimilarity(mysql_info=mysql_info,# perl_bin_path='' )# show the names of all available measures so you can pass them into the following `measure` parametermeasures=umls_sim.get_all_measures()print(measures)# Directly pass two CUIs into the function below:sims=umls_sim.similarity(cui1="C0017601",cui2="C0232197",measure="lch")print(sims[0])# only one pair with two concepts# Or batch process many CUI pairs from a text file where each line is formatted like 'C0006949<>C0031507'current_path=os.path.dirname(os.path.realpath(__file__))sims=umls_sim.similarity_from_file(current_path+r"\cuis_umls_sim.txt",measure="lch")forsiminsims:print(sim)
Example Code 2: Estimate similarity between concept using WordNet 2.1
fromumls_similarity.wordnetimportWNSimilarityif__name__=="__main__":wn_root_path=r"C:\Program Files (x86)\WordNet\2.1"# perl_bin_path=r"C:\Strawberry\perl\bin\perl"var1="dog#n#1"var2="orange#n#1"wn_sim=WNSimilarity(wn_root_path=wn_root_path)sims=wn_sim.similarity(var1,var2)print(sims)fork,vinenumerate(sims):print(k,'\t',v,'\t',sims[v])
This project is a wrapper of the Perl library ofUMLS::Similarity andUMLS::Interface.
Note: There are plenty of unexpected errors to occur during the installation of the perl library ofUMLS::Similarity
, possibly because I am not an expert about Perl and its library use.
Theumls-similarity
Python package is provided byDonghua Chen.