
LBDiscover is an R package for literature-based discovery (LBD) inbiomedical research. It provides a comprehensive suite of tools forretrieving scientific articles, extracting biomedical entities, buildingco-occurrence networks, and applying various discovery models to uncoverhidden connections in the scientific literature.
The package implements several literature-based discovery approachesincluding:
LBDiscover also features powerful visualization tools for exploringdiscovered connections using networks, heatmaps, and interactivediagrams.
# Install from CRANinstall.packages("LBDiscover")# Or install the development version from GitHub# install.packages("devtools")devtools::install_github("chaoliu-cl/LBDiscover")LBDiscover provides a complete workflow for literature-baseddiscovery:
library(LBDiscover)# Retrieve articles from PubMedarticles<-pubmed_search("migraine treatment",max_results =100)# Preprocess article textpreprocessed<-vec_preprocess( articles,text_column ="abstract",remove_stopwords =TRUE)# Extract biomedical entitiesentities<-extract_entities_workflow( preprocessed,text_column ="abstract",entity_types =c("disease","drug","gene"))# Create co-occurrence matrixco_matrix<-create_comat( entities,doc_id_col ="doc_id",entity_col ="entity",type_col ="entity_type")# Apply the ABC model to find new connectionsabc_results<-abc_model( co_matrix,a_term ="migraine",n_results =50,scoring_method ="combined")# Visualize the resultsvis_abc_network(abc_results,top_n =20)The ABC model is based on Swanson’s discovery paradigm. If concept Ais related to concept B, and concept B is related to concept C, but Aand C are not directly connected in the literature, then A may have ahidden relationship with C.
# Apply the ABC modelabc_results<-abc_model( co_matrix,a_term ="migraine",min_score =0.1,n_results =50)# Visualize as a networkvis_abc_network(abc_results)# Or as a heatmapvis_heatmap(abc_results)The AnC model is an extension of the ABC model that uses multiple Bterms to establish stronger connections between A and C.
# Apply the AnC modelanc_results<-anc_model( co_matrix,a_term ="migraine",n_b_terms =5,min_score =0.1)The Latent Semantic Indexing model identifies semantically relatedterms using dimensionality reduction techniques.
# Create term-document matrixtdm<-create_term_document_matrix(preprocessed)# Apply LSI modellsi_results<-lsi_model( tdm,a_term ="migraine",n_factors =100)The package offers multiple visualization options:
# Network visualizationvis_abc_network(abc_results,top_n =25)# Heatmap of connectionsvis_heatmap(abc_results,top_n =20)# Export interactive HTML networkexport_network(abc_results,output_file ="abc_network.html")# Export interactive chord diagramexport_chord(abc_results,output_file ="abc_chord.html")For an end-to-end analysis:
# Run comprehensive discovery analysisdiscovery_results<-run_lbd(search_query ="migraine pathophysiology",a_term ="migraine",discovery_approaches =c("abc","anc","lsi"),include_visualizations =TRUE,output_file ="discovery_report.html")For more detailed documentation and examples, please see the packagevignettes:
# View package vignettesbrowseVignettes("LBDiscover")If you use LBDiscover in your research, please cite:
Liu, C. (2025). LBDiscover: Literature-Based Discovery Tools for Biomedical Research. R package version 0.1.0. https://github.com/chaoliu-cl/LBDiscoverThis project is licensed under the GPL-3 License - see the LICENSEfile for details.