- Notifications
You must be signed in to change notification settings - Fork2
A R package for Nominal Data Mining Analysis
License
jafarilab/NIMAA
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
The NIMAA package [@nimaa] provides a comprehensive set of methods forperforming nominal data mining.
It employs bipartite networks to demonstrate how two nominal variablesare linked, and then places them in the incidence matrix to proceed withnetwork analysis. NIMAA aids in characterizing the pattern of missingvalues in a dataset, locating large submatrices with non-missing values,and predicting edges within nominal variable labels. Then, given asubmatrix, two unipartite networks are constructed using various networkprojection methods. NIMAA provides a variety of choices for clusteringprojected networks and selecting the best one. The best clusteringresults can also be used as a benchmark for imputation analysis inweighted bipartite networks.
You can install the released version of NIMAA fromCRAN with:
install.packages("NIMAA")And the development version fromGitHub with:
# install.packages("devtools")devtools::install_github("jafarilab/NIMAA")
library(NIMAA)## load the beatAML databeatAML_data<-NIMAA::beatAML# plot the original databeatAML_incidence_matrix<- plotIncMatrix(x=beatAML_data,# original data with 3 columnsindex_nominal= c(2,1),# the first two columns are nominal dataindex_numeric=3,# the third column is numeric dataprint_skim=FALSE,# if you want to check the skim output, set this as TRUEplot_weight=TRUE,# when plotting the weighted incidence matrixverbose=FALSE# NOT save the figures to local folder )#>#> Na/missing values Proportion: 0.2603
plotBipartite(inc_mat=beatAML_incidence_matrix,vertex.label.display=T)
#> IGRAPH 7cf38ef UNWB 650 47636 -- #> + attr: name (v/c), type (v/l), shape (v/c), color (v/c), weight (e/n)#> + edges from 7cf38ef (vertex names):#> [1] Alisertib (MLN8237) --11-00261 Barasertib (AZD1152-HQPA)--11-00261#> [3] Bortezomib (Velcade) --11-00261 Canertinib (CI-1033) --11-00261#> [5] Crenolanib --11-00261 CYT387 --11-00261#> [7] Dasatinib --11-00261 Doramapimod (BIRB 796) --11-00261#> [9] Dovitinib (CHIR-258) --11-00261 Erlotinib --11-00261#> [11] Flavopiridol --11-00261 GDC-0941 --11-00261#> [13] Gefitinib --11-00261 Go6976 --11-00261#> [15] GW-2580 --11-00261 Idelalisib --11-00261#> + ... omitted several edgesTheextractSubMatrix() function extracts the submatrices that havenon-missing values or have a certain percentage of missing values inside(not for elements-max matrix), depending on the argument’s input. Thepackage vignette and help manual contain more details.
sub_matrices<- extractSubMatrix(x=beatAML_incidence_matrix,shape= c("Square","Rectangular_element_max"),# the selected shapes of submatricesrow.vars="patient_id",col.vars="inhibitor",plot_weight=TRUE,print_skim=FALSE )#> binmatnest2.temperature#> 20.12539#> Size of Square: 96 rows x 96 columns#> Size of Rectangular_element_max: 87 rows x 140 columns
ThefindCluster() function implements seven widely used networkclustering algorithms, with the option of preprocessing the inputincidence matrix following the projecting of the bipartite network intounipartite networks. Also, internal and external measurements can beused to compare clustering algorithms. Details can be found in thepackage vignette and help manual.
cls<- findCluster(sub_matrices$Rectangular_element_max,part=1,method="all",# all available clustering methodsnormalization=TRUE,# normalize the input matrixrm_weak_edges=TRUE,# remove the weak edges in the networkrm_method='delete',# delete the weak edges instead of lowering their weights to 0.threshold='median',# Use median of edges' weights as thresholdset_remaining_to_1=TRUE,# set the weights of remaining edges to 1 )#> Warning in findCluster(sub_matrices$Rectangular_element_max, part = 1, method =#> "all", : cluster_spinglass cannot work with unconnected graph#>#>#> | | walktrap| louvain| infomap| label_prop| leading_eigen| fast_greedy|#> |:------------|---------:|---------:|---------:|----------:|-------------:|-----------:|#> |modularity | 0.0125994| 0.0825865| 0.0000000| 0.0000000| 0.0806766| 0.0825865|#> |avg.silwidth | 0.2109092| 0.1134990| 0.9785714| 0.9785714| 0.1001961| 0.1134990|#> |coverage | 0.9200411| 0.5866393| 1.0000000| 1.0000000| 0.5806783| 0.5866393|
ThepredictEdge() function predicts new edges between nominalvariables’ labels or imputes missing values in the input data matrixusing several imputation methods. We can compare the imputation resultsusing thevalidateEdgePrediction() function to choose the best methodbased on a predefined benchmark. The package vignette and help manualcontain more details.
imputations<- predictEdge(inc_mat=beatAML_incidence_matrix,method= c('svd','median','als','CA') )
validateEdgePrediction(imputation=imputations,refer_community=cls$fast_greedy,clustering_args=cls$clustering_args)#>#>#> | | Jaccard_similarity| Dice_similarity_coefficient| Rand_index| Minkowski (inversed)| Fowlkes_Mallows_index|#> |:------|------------------:|---------------------------:|----------:|--------------------:|---------------------:|#> |median | 0.7476353| 0.8555964| 0.8628983| 1.870228| 0.8556407|#> |svd | 0.7224792| 0.8388829| 0.8458376| 1.763708| 0.8388853|#> |als | 0.7599244| 0.8635875| 0.8694758| 1.916772| 0.8635900|#> |CA | 0.6935897| 0.8190765| 0.8280576| 1.670030| 0.8191111|
#> imputation_method Jaccard_similarity Dice_similarity_coefficient Rand_index#> 1 median 0.7476353 0.8555964 0.8628983#> 2 svd 0.7224792 0.8388829 0.8458376#> 3 als 0.7599244 0.8635875 0.8694758#> 4 CA 0.6935897 0.8190765 0.8280576#> Minkowski (inversed) Fowlkes_Mallows_index#> 1 1.870228 0.8556407#> 2 1.763708 0.8388853#> 3 1.916772 0.8635900#> 4 1.670030 0.8191111About
A R package for Nominal Data Mining Analysis
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Contributors3
Uh oh!
There was an error while loading.Please reload this page.






