donttestmock_(class name), e.g.,mock_data_list andmock_ext_solutions_dfauto_plot output data frame doesn’t duplicate clustercolumnext_solutions_df no longer losessim_mats_list attributedonttest rather than commented outobservations(),summary_features(),features(),uids() marked as internalrbind for classessolutions_df andext_solutions_df not preserving the class type of thecontainedweights_matrixsolutions_df orext_solutions_dfrestricts output to 10 line max by defaultsolution column inmc_manhattan_plot()when extended solutions data frame has no MC labelsweights matrixmerge.data_list()as.list() fordist_fns_list,clust_fns_list, anddata_list objectsgenerate_settings_matrix neededpaste0print.solutions_df() misprinted the number ofobservations in the solutions data framemerge_dls() is superseded bymerge.data_lists()ext_solutions_df manipulation won’t dropsummary_features andfeatures attributesestimate_nclust_given_graph has more resiliency tofloating point errors through tryCatch statement during eigengap qualityassignmentestimate_nclust_given_graph has more resiliencyto floating point errors through tryCatch loop updating eigenvaluescalingdplyr_row_slice() functions forclassessolutions_df andext_solutions_dfextend_solutions()extend_solutions was not assigning feature typesproperly during p-value calculationsrbind.ext_solutions_df now takes...parameter beforereset_indices parameter to avoid errorduring calls with unnamed parameters.rbind.solutions_df now takes... parameterbeforereset_indices parameter to avoid error during callwithout named parameters.snf_config object made weights matrix lose itsclasslist) -> (classdata_list,list)data.frame) -> solutionsdata frame (classsolutions_df,data.frame)data.frame) ->extended solutions data frame (classext_solutions_df,data.frame)data.frame) -> (classext_solutions_df,data.frame)list) -> distancefunctions list (classdist_fns_list,list)list) ->clustering functions list (classclust_fns_list,list)matrix,array) ->(classweights_matrix,matrix,array)generate_data_list() ->data_list()get_cluster_df(),get_clusters(),get_cluster_solutions()) nowall superseded by custom transposition ofsolutions_dfclass objects (i.e., simply callt())generate_settings_matrix(),generate_distance_metrics_list(),generate_weights_matrix(),generate_clust_algs_list()) now all superseded by singlefunctionsnf_config() and thesnf_config classobject it producessplit_vector, either byadjusted_rand_index_heatmap() orshiny_annotator(),solutions_df andext_solutions_df class objects can be annotated with theirmeta cluster labels using the functionlabel_meta_clusters(). This is necessary prior to usage ofget_representative_solutions().as.data.frame()batch_snf no longer changes the output structure from asolutions data frame to a list of a solutions data frame and asimilarity matrix list. Instead, the similarity matrix list is added tothe solutions data frame as an attribute and can be extracted using thefunctionsim_mats_list().calculate_coclustering() functionprint() functions have been defined for allmajor metasnf objects.Last update before CRAN submission.
set.seed prior togenerate_settings_matrix instead.estimate_nclust_given_graph() occasionallyyielded incorrect number of cluster estimates as a result of improperscaling in metasnf v0.7.0. The scaling should be corrected now.mc_manhattan_plot() with adata list containing duplicate feature namesmc_manhattan_plot() parameterrep_solutionreplaced with more accurate nameextended_solutions_matrix(solutions matrix with _pval columns)SNFtool::estimateNumberOfClustersGivenGraph() couldoccasionally error out on the basis of calculating eigenvectors(eigengap heuristic) for a Laplacian with floating point values thatwere too small. Adapted functionestimate_nclust_given_graph() slightly scales up Laplacianto reduce the risk of encountering this error (presumably without anychange to resulting cluster number estimate)get_matrix_order has arguments allowing users tocontrol which distance metric and agglomerative hierarchical clusteringmethods are used to sort matricesget_complete_uids quickly pulls UIDs of observationswith complete data from a list of dataframesextend_solutions doesn’t crash on multi-feature targetlistsgenerate_data_list()remove_missing parameter forgenerate_data_list allowing subjects with incomplete datato remain in the data listlp_solutions_matrix error message whentraining set is not subset of full data listgenerate_data_list list elements now are named aftertheir componentsmerge_data_lists functionality to horizontallymerge data listsextend_solutions() will no longer crash when adata_list has the UID column in non-first position.generate_data_list() enforces the UID column to be infirst position of each dataframe.auto_plot() will automatically generate bar and/orjitter plots showing how features in a data_list/target_list aredistributed across a single cluster solutionshiny_annotator() function can be used to identifyindices of meta clusters within anadjusted_rand_index_heatmapadjusted_rand_index_heatmap() now has asplit_vector parameter that will slice a heatmap into metaclustersrename_dl() can be used to rename features in adata_listmanhattan_plot has been split intovar_manhattan_plot (key variable - all variables),esm_manhattan_plot (cluster solutions in an extendedsolutions matrix to all variables), andmc_manhattan_plot(likeesm_manhattan_plot, but at the meta-clusterlevel)get_representative_solutions extracts max-ARI solutionsfrom an extended solutions matrix based on asplit_vectorcontaining meta cluster boundariesbatch_nmi calculates NMI scores (seehttps://branchlab.github.io/metasnf/articles/nmi_scores.html)extend_solutions will only calculate p-value summarymeasures (min/max/mean) for data_list passed in as atarget_list parameter, but will also accept and calculatep-values for a data_list passed in through thedata_listparameteradjusted_rand_index_heatmap andassoc_pval_heatmap have updated parameters to improve easeof use and flexibility (including easier colour control)get_clustered_subs has been removed (does the samething asget_cluster_df)get_cluster_pval deprecated forcalc_assoc_pvalgenerate_data_list()and its corresponding functionsremove_signal has been renamed tolinear_adjust to better reflect its functionsummarize_distance_metrics_list has been shortened tosummarize_dmlcorrelation_pval_heatmap has been renamed toassoc_pval_heatmapcalc_om_aris has been renamed tocalc_arisextend_solutions p-value calculationwarnings are now suppressed_pval instead of a mix ofp_val,pval, andp.pval_select,p_val_select,top_oms_per_cluster,check_subj_orders_for_lp,get_p,chi_sq_pval,pval_summaries, which would calculatemin/max/mean p-values, has been replaced withsummarize_pvalstrain_test_assign now provides results as named list ofsubject vectors instead of a data.frame.keep_splitfunction has been removed accordingly.sort_subjects parameter added togenerate_data_list to allow for sorting of subjects in thedata_listextend_solutions can now also be parallelized (see?extend_solutions)remove_signal function hassig_digsparameter that can be used to restrict how many significant figures arereturned in the resulting residualscalc_om_aris is now MUCH faster after removingexcessive calls toas.numeric and enabling parallelprocessing withfuture.apply. Thanks for the idea,Alper.extend_solutions to better handleextreme p-values (e.g. infinity)p_val_select withpval_select which can also return negative-logp-valuesgenerate_data_list correctly errors when components areonly partially named (resolveshttps://github.com/BRANCHlab/metasnf/issues/10)lp_row function has been replaced bylp_solutions_matrix. The new function is order agnostic:full data lists can be constructed without any restriction on howtraining and testing set subjects are sorted. Subjects present in theprovided solutions matrix to propagate are assumed to be the trainingsubjects.calc_om_aris now hasprogress parameter.When set to true and used in conjunction withprogressr::with_progress(), a progress bar is shown for thecalculations. Learn more with?calc_om_aris.grepl instead ofgrep used inextend_solutions to reduce errors when no chi-squaredwarning occurskeep_split will preserve observations who were assigneda split but were not present in the dataframe being split. Instead ofbeing removed, those observations will have NA values.fraction_clustered_together crashing when acluster was assigned to only a single observationfraction_clustered_together not running due tobracket typo when evaluating length of the data_listcorrelation_pval_heatmap function can have significancestars disabled withsignificance_stars parameterestimateNumberOfClustersGivenGraph has been used up to thispoint without specifying a parameter forNUMC.Consequently, final similarity matrices clustered with the defaultmethods (spectral clustering based on eigen-gap or rotation costheuristics) were not capable of resulting in more than 5 clusters. Thedefault functions have been updated to span 2 clusters to 10 clusters.Users will likely see different clustering results as a result of thischange. To replicate the behaviour of default spectral clustering priorto v0.3.0, users should copy the following code prior to the batch_snfcommand:clust_algs_list <- generate_clust_algs_list( "spectral_eigen" = spectral_eigen_classic, "spectral_rot" = spectral_rot_classic)# Adapt below as necessarysolutions_matrix <- batch_snf( data_list, settings_matrix, clust_algs_list = clust_algs_list)fisher_exact_pvalfunction to avoid “FEXACT” error (like herehttps://github.com/Lagkouvardos/Rhea/issues/17). Impact on results isexpected to be negligible.remove_signal() enables correcting a data_listlinearly for confounders / unwanted signal. Vignette is available:https://branchlab.github.io/metasnf/articles/confounders.html.batch_snf() has new parameterautomatic_standard_normalize to switch out the defaultnumeric distance measures (euclidean) with standard normalizedvariants.NEWS.md file to track changes to thepackage.