Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A R package for Nominal Data Mining Analysis

License

NotificationsYou must be signed in to change notification settings

jafarilab/NIMAA

Repository files navigation

CRAN statusGitHub ReleaseGithub All Releases

The NIMAA package [@nimaa] provides a comprehensive set of methods forperforming nominal data mining.

It employs bipartite networks to demonstrate how two nominal variablesare linked, and then places them in the incidence matrix to proceed withnetwork analysis. NIMAA aids in characterizing the pattern of missingvalues in a dataset, locating large submatrices with non-missing values,and predicting edges within nominal variable labels. Then, given asubmatrix, two unipartite networks are constructed using various networkprojection methods. NIMAA provides a variety of choices for clusteringprojected networks and selecting the best one. The best clusteringresults can also be used as a benchmark for imputation analysis inweighted bipartite networks.

Installation

You can install the released version of NIMAA fromCRAN with:

install.packages("NIMAA")

And the development version fromGitHub with:

# install.packages("devtools")devtools::install_github("jafarilab/NIMAA")

Example

Plotting the original data

library(NIMAA)## load the beatAML databeatAML_data<-NIMAA::beatAML# plot the original databeatAML_incidence_matrix<- plotIncMatrix(x=beatAML_data,# original data with 3 columnsindex_nominal= c(2,1),# the first two columns are nominal dataindex_numeric=3,# the third column is numeric dataprint_skim=FALSE,# if you want to check the skim output, set this as TRUEplot_weight=TRUE,# when plotting the weighted incidence matrixverbose=FALSE# NOT save the figures to local folder  )#>#> Na/missing values Proportion:     0.2603

The beatAML dataset as an incidence matrix

Plotting the bipartite network of the original data

plotBipartite(inc_mat=beatAML_incidence_matrix,vertex.label.display=T)

#> IGRAPH 7cf38ef UNWB 650 47636 -- #> + attr: name (v/c), type (v/l), shape (v/c), color (v/c), weight (e/n)#> + edges from 7cf38ef (vertex names):#>  [1] Alisertib (MLN8237)      --11-00261 Barasertib (AZD1152-HQPA)--11-00261#>  [3] Bortezomib (Velcade)     --11-00261 Canertinib (CI-1033)     --11-00261#>  [5] Crenolanib               --11-00261 CYT387                   --11-00261#>  [7] Dasatinib                --11-00261 Doramapimod (BIRB 796)   --11-00261#>  [9] Dovitinib (CHIR-258)     --11-00261 Erlotinib                --11-00261#> [11] Flavopiridol             --11-00261 GDC-0941                 --11-00261#> [13] Gefitinib                --11-00261 Go6976                   --11-00261#> [15] GW-2580                  --11-00261 Idelalisib               --11-00261#> + ... omitted several edges

Extracting large submatrices without missing values

TheextractSubMatrix() function extracts the submatrices that havenon-missing values or have a certain percentage of missing values inside(not for elements-max matrix), depending on the argument’s input. Thepackage vignette and help manual contain more details.

sub_matrices<- extractSubMatrix(x=beatAML_incidence_matrix,shape= c("Square","Rectangular_element_max"),# the selected shapes of submatricesrow.vars="patient_id",col.vars="inhibitor",plot_weight=TRUE,print_skim=FALSE  )#> binmatnest2.temperature#>                20.12539#> Size of Square:   96 rows x  96 columns#> Size of Rectangular_element_max:      87 rows x  140 columns

Row-wise arrangement

Column-wise arrangement

Cluster finding analysis of projected unipartite networks

ThefindCluster() function implements seven widely used networkclustering algorithms, with the option of preprocessing the inputincidence matrix following the projecting of the bipartite network intounipartite networks. Also, internal and external measurements can beused to compare clustering algorithms. Details can be found in thepackage vignette and help manual.

cls<- findCluster(sub_matrices$Rectangular_element_max,part=1,method="all",# all available clustering methodsnormalization=TRUE,# normalize the input matrixrm_weak_edges=TRUE,# remove the weak edges in the networkrm_method='delete',# delete the weak edges instead of lowering their weights to 0.threshold='median',# Use median of edges' weights as thresholdset_remaining_to_1=TRUE,# set the weights of remaining edges to 1  )#> Warning in findCluster(sub_matrices$Rectangular_element_max, part = 1, method =#> "all", : cluster_spinglass cannot work with unconnected graph#>#>#> |             |  walktrap|   louvain|   infomap| label_prop| leading_eigen| fast_greedy|#> |:------------|---------:|---------:|---------:|----------:|-------------:|-----------:|#> |modularity   | 0.0125994| 0.0825865| 0.0000000|  0.0000000|     0.0806766|   0.0825865|#> |avg.silwidth | 0.2109092| 0.1134990| 0.9785714|  0.9785714|     0.1001961|   0.1134990|#> |coverage     | 0.9200411| 0.5866393| 1.0000000|  1.0000000|     0.5806783|   0.5866393|

Edge predicting in weighted bipartite networks

ThepredictEdge() function predicts new edges between nominalvariables’ labels or imputes missing values in the input data matrixusing several imputation methods. We can compare the imputation resultsusing thevalidateEdgePrediction() function to choose the best methodbased on a predefined benchmark. The package vignette and help manualcontain more details.

imputations<- predictEdge(inc_mat=beatAML_incidence_matrix,method= c('svd','median','als','CA')  )
validateEdgePrediction(imputation=imputations,refer_community=cls$fast_greedy,clustering_args=cls$clustering_args)#>#>#> |       | Jaccard_similarity| Dice_similarity_coefficient| Rand_index| Minkowski (inversed)| Fowlkes_Mallows_index|#> |:------|------------------:|---------------------------:|----------:|--------------------:|---------------------:|#> |median |          0.7476353|                   0.8555964|  0.8628983|             1.870228|             0.8556407|#> |svd    |          0.7224792|                   0.8388829|  0.8458376|             1.763708|             0.8388853|#> |als    |          0.7599244|                   0.8635875|  0.8694758|             1.916772|             0.8635900|#> |CA     |          0.6935897|                   0.8190765|  0.8280576|             1.670030|             0.8191111|

#>   imputation_method Jaccard_similarity Dice_similarity_coefficient Rand_index#> 1            median          0.7476353                   0.8555964  0.8628983#> 2               svd          0.7224792                   0.8388829  0.8458376#> 3               als          0.7599244                   0.8635875  0.8694758#> 4                CA          0.6935897                   0.8190765  0.8280576#>   Minkowski (inversed) Fowlkes_Mallows_index#> 1             1.870228             0.8556407#> 2             1.763708             0.8388853#> 3             1.916772             0.8635900#> 4             1.670030             0.8191111

License

GPLv3 License

About

A R package for Nominal Data Mining Analysis

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

[8]ページ先頭

©2009-2025 Movatter.jp