- Notifications
You must be signed in to change notification settings - Fork19
Information Theory and Distance Quantification with R
License
drostlab/philentropy
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Describe and understand the world through data.
Data collection and data comparison are the foundations of scientific research.Mathematics provides the abstract framework to describe patterns we observe in nature andStatistics provides theframework to quantify the uncertainty of these patterns. In statistics, natural patternsare described in form of probability distributions which either follow a fixed pattern (parametric distributions) or more dynamic patterns (non-parametric distributions).
Thephilentropy
package implements fundamental distance and similarity measures to quantify distances between probability density functions as well as traditional information theory measures. In this regard, it aims to provide a framework for comparingnatural patterns in a statistical notation.
This project is born out of my passion for statistics and I hope that it will be useful tothe people who share it with me.
# install philentropy version 0.9.0 from CRANinstall.packages("philentropy")
I am developingphilentropy
in my spare time and would be very grateful if you would consider citing the following paper in casephilentropy
was useful for your own research. I plan on maintaining and extending thephilentropy
functionality and usability in the next years and require citations to back up these efforts. Many thanks in advance :)
HG Drost, (2018).Philentropy: Information Theory and Distance Quantification with R.Journal of Open Source Software, 3(26), 765.https://doi.org/10.21105/joss.00765
- Introduction to the philentropy package
- Distance and Similarity Measures implemented in philentropy
- Information Theory Metrics implemented in philentropy
- Comparing many probability density functions
library(philentropy)# retrieve available distance metricsphilentropy::getDistMethods()
[1] "euclidean" "manhattan" "minkowski" [4] "chebyshev" "sorensen" "gower" [7] "soergel" "kulczynski_d" "canberra" [10] "lorentzian" "intersection" "non-intersection" [13] "wavehedges" "czekanowski" "motyka" [16] "kulczynski_s" "tanimoto" "ruzicka" [19] "inner_product" "harmonic_mean" "cosine" [22] "hassebrook" "jaccard" "dice" [25] "fidelity" "bhattacharyya" "hellinger" [28] "matusita" "squared_chord" "squared_euclidean"[31] "pearson" "neyman" "squared_chi" [34] "prob_symm" "divergence" "clark" [37] "additive_symm" "kullback-leibler" "jeffreys" [40] "k_divergence" "topsoe" "jensen-shannon" [43] "jensen_difference" "taneja" "kumar-johnson" [46] "avg"
# define a probability density function PP<-1:10/sum(1:10)# define a probability density function QQ<-20:29/sum(20:29)# combine P and Q as matrix objectx<- rbind(P,Q)# compute the jensen-shannon distance between# probability density functions P and Qphilentropy::distance(x,method="jensen-shannon")
jensen-shannon using unit 'log'.jensen-shannon 0.02628933
Alternatively, users can also retrieve values from all available distance/similarity metricsusingphilentropy::dist.diversity()
:
philentropy::dist.diversity(x,p=2,unit="log2")
euclidean manhattan 0.12807130 0.35250464 minkowski chebyshev 0.12807130 0.06345083 sorensen gower 0.17625232 0.03525046 soergel kulczynski_d 0.29968454 0.42792793 canberra lorentzian 2.09927095 0.49712136 intersection non-intersection 0.82374768 0.17625232 wavehedges czekanowski 3.16657887 0.17625232 motyka kulczynski_s 0.58812616 2.33684211 tanimoto ruzicka 0.29968454 0.70031546 inner_product harmonic_mean 0.10612245 0.94948528 cosine hassebrook 0.93427641 0.86613103 jaccard dice 0.13386897 0.07173611 fidelity bhattacharyya 0.97312397 0.03930448 hellinger matusita 0.32787819 0.23184489 squared_chord squared_euclidean 0.05375205 0.01640226 pearson neyman 0.16814418 0.36742465 squared_chi prob_symm 0.10102943 0.20205886 divergence clark 1.49843905 0.86557468 additive_symm kullback-leibler 0.53556883 0.13926288 jeffreys k_divergence 0.31761069 0.04216273 topsoe jensen-shannon 0.07585498 0.03792749 jensen_difference taneja 0.03792749 0.04147518 kumar-johnson avg 0.62779644 0.20797774
# install.packages("devtools")# install the current version of philentropy on your systemlibrary(devtools)install_github("HajkD/philentropy",build_vignettes=TRUE,dependencies=TRUE)
The current status of the package as well as a detailed history of the functionality of each version ofphilentropy
can be found in theNEWS section.
distance()
: Implements 46 fundamental probability distance (or similarity) measuresgetDistMethods()
: Get available method names for 'distance'dist.diversity()
: Distance Diversity between Probability Density Functionsestimate.probability()
: Estimate Probability Vectors From Count Vectors
H()
: Shannon's Entropy H(X)JE()
: Joint-Entropy H(X,Y)CE()
: Conditional-Entropy H(X | Y)MI()
: Shannon's Mutual Information I(X,Y)KL()
: Kullback–Leibler DivergenceJSD()
: Jensen-Shannon DivergencegJSD()
: Generalized Jensen-Shannon Divergence
A transcriptomic hourglass in brown algae JS Lotharukpong, M Zheng, R Luthringer et al. -Nature, 2024
Annelid functional genomics reveal the origins of bilaterian life cycles FM Martín-Zamora, Y Liang, K Guynes et al.-Nature, 2023
An atlas of gene regulatory elements in adult mouse cerebrum YE Li, S Preissl, X Hou, Z Zhang, K Zhang et al.-Nature, 2021
Convergent somatic mutations in metabolism genes in chronic liver disease S Ng, F Rouhani, S Brunner, N Brzozowska et al.Nature, 2021
Antigen dominance hierarchies shape TCF1+ progenitor CD8 T cell phenotypes in tumors ML Burger, AM Cruz, GE Crossland et al. -Cell, 2021
High-content single-cell combinatorial indexing R Mulqueen et al. -Nature Biotechnology, 2021
A comparative atlas of single-cell chromatin accessibility in the human brainYE Li, S Preissl, M Miller, ND Johnson, Z Wang et al. -Science, 2023
Extinction at the end-Cretaceous and the origin of modern Neotropical rainforests MR Carvalho, C Jaramillo et al. -Science, 2021
sciCSR infers B cell state transition and predicts class-switch recombination dynamics using single-cell transcriptomic dataJCF Ng, G Montamat Garcia, AT Stewart et al. -Nature Methods, 2024
HERMES: a molecular-formula-oriented method to target the metabolomeR Giné, J Capellades, JM Badia et al. -Nature Methods, 2021
Epithelial zonation along the mouse and human small intestine defines five discrete metabolic domainsRK Zwick, P Kasparek, B Palikuqi, et al. -Nature Cell Biology, 2024
The genetic architecture of temperature adaptation is shaped by population ancestry and not by selection regime KA Otte, V Nolte, F Mallard et al. -Genome Biology, 2021
The Tug1 lncRNA locus is essential for male fertility JP Lewandowski et al. -Genome Biology, 2020
Decoding the gene regulatory network of endosperm differentiation in maizeY Yuan, Q Huo, Z Zhang, Q Wang, J Wang, et al. -Nature Communications, 2024
A full-body transcription factor expression atlas with completely resolved cell identities in C. elegansY Li, S Chen, W Liu, D Zhao, Y Gao, S Hu, H Liu, Y Li… -Nature Communications, 2024
Comprehensive mapping and modelling of the rice regulome landscape unveils the regulatory architecture underlying complex traits T Zhu, C Xia, R Yu, X Zhou, X Xu, L Wang et al. -Nature Communications, 2024
Transcriptional vulnerabilities of striatal neurons in human and rodent models of Huntington's disease A Matsushima, SS Pineda, JR Crittenden et al. -Nature Communications, 2023
Resolving the structure of phage–bacteria interactions in the context of natural diversity KM Kauffman, WK Chang, JM Brown et al. -Nature Communications, 2022
Gut microbiome-mediated metabolism effects on immunity in rural and urban African populationsM Stražar, GS Temba, H Vlamakis et al. -Nature Communications, 2021
Aging, inflammation and DNA damage in the somatic testicular niche with idiopathic germ cell aplasia M Alfano, AS Tascini, F Pederzoli et al. -Nature Communications, 2021
Single cell census of human kidney organoids shows reproducibility and diminished off-target cells after transplantation A Subramanian et al. -Nature Communications, 2019
A single-cell multi-omic atlas spanning the adult rhesus macaque brainKL Chiou, X Huang, MO Bohlen et al. -Science Advances, 2023
Different languages, similar encoding efficiency: Comparable information rates across the human communicative nicheC Coupé, YM Oh, D Dediu, F Pellegrino -Science Advances, 2019
Single-cell deletion analyses show control of pro–T cell developmental speed and pathways by Tcf7, Spi1, Gata3, Bcl11a, Erg, and Bcl11b W Zhou, F Gao, M Romero-Wolf, S Jo, EV Rothenberg -Science Immunology, 2022
Large-scale iterated singing experiments reveal oral transmission mechanisms underlying music evolutionM Anglada-Tort, PMC Harrison, H Lee, N Jacoby -Current Biology, 2023
Detecting and diagnosing prior and likelihood sensitivity with power-scalingN Kallioinen, T Paananen, PC Bürkner, A Vehtari -Statistics and Computing, 2024
SLAPSHOT reveals rapid dynamics of extracellularly exposed proteome in response to calcium-activated plasma membrane phospholipid scrambling ST Tuomivaara, CF Teo, YN Jan, AP Wiita et al. -Communications Biology, 2024
TAS-Seq is a robust and sensitive amplification method for bead-based scRNA-seq S Shichino, S Ueha, S Hashimoto, T Ogawa et al. -Communications biology, 2022
Mapping hormone-regulated cell-cell interaction networks in the human breast at single-cell resolutionLM Murrow, RJ Weber, JA Caruso et al. -Cell Systems, 2022
Gene module reconstruction identifies cellular differentiation processes and the regulatory logic of specialized secretion in zebrafish Y Wang, J Liu, LY Du, JL Wyss, JA Farrell, AF Schier -Developmental Cell, 2024
The temporal progression of lung immune remodeling during breast cancer metastasis CS McGinnis, Z Miao, D Superville, W Yao, et al. -Cancer Cell, 2024
Large-scale chromatin reorganization reactivates placenta-specific genes that drive cellular aging Z Liu, Q Ji, J Ren, P Yan, Z Wu, S Wang, L Sun, Z Wang et al. -Developmental Cell, 2022
Direct epitranscriptomic regulation of mammalian translation initiation through N4-acetylcytidine D Arango, D Sturgill, R Yang, T Kanai, P Bauer et al. -Molecular Cell, 2022
Spotless, a reproducible pipeline for benchmarking cell type deconvolution in spatial transcriptomics C Sang-Aram, R Browaeys, R Seurinck, Y Saeys -eLife, 2024
Loss of adaptive capacity in asthmatic patients revealed by biomarker fluctuation dynamics after rhinovirus challenge A Sinha et al. -eLife, 2019
Sex and hatching order modulate the association between MHC‐II diversity and fitness in early‐life stages of a wild seabirdM Pineaux et al -Molecular Ecology, 2020
BELMM: Bayesian model selection and random walk smoothing in time-series clusteringO Sarala, T Pyhäjärvi, MJ Sillanpää -Bioinformatics, 2023
Cellsig plug-in enhances CIBERSORTx signature selection for multi-dataset transcriptomes with sparse multilevel modellingMA Al Kamran Khan, J Wu, S Yuhan et al. -Bioinformatics, 2023
Neutrality in plant–herbivore interactions VS Pan, WC Wetzel -Proceedings of the Royal Society B, 2024
How the Choice of Distance Measure Influences the Detection of Prior-Data ConflictK Lek, R Van De Schoot -Entropy, 2019
Differential variation analysis enables detection of tumor heterogeneity using single-cell RNA-sequencing dataEF Davis-Marcisak, TD Sherman et al. -Cancer research, 2019
Multi-Omics Investigation of Innate Navitoclax Resistance in Triple-Negative Breast Cancer Cells M Marczyk et al. -Cancers, 2020
Impact of Gut Microbiome on Hypertensive Patients with Low-Salt Intake: Shika Study Results S Nagase et al. -Frontiers in Medicine, 2020
Tumor-associated neutrophils upregulate PANoptosis to foster an immunosuppressive microenvironment of non-small cell lung cancerQ Hu, R Wang, J Zhang, Q Xue, B Ding -Cancer Immunology, Immunotherapy, 2023
Spatial and Temporal Relationships Between Atrophy and Hypometabolism in Behavioral-Variant Frontotemporal Dementia J Stocks, E Gibson, K Popuri, MF Beg et al. -Alzheimer Disease & Associated Disorders, 2024
Unveiling Dynamic Changes and Regulatory Mechanisms of T Cell Subsets in Sepsis Pathogenesis C Jiang, J Chen, T Sun, J Xu, H Zhu, J Chen -ImmunoTargets and Therapy, 2024
Integrated analysis reveals NLRC4 as a potential biomarker in sepsis pathogenesis C Jiang, J Chen, J Xu, C Chen, H Zhu, Y Xu, H Zhao et al. -Genes & Immunity, 2024
Temporal composition of the cervicovaginal microbiome associates with hrHPV infection outcomes in a longitudinal study MA Molina, WPJ Leenders, MA Huynen, WJG Melchers et al. -BMC Infectious Diseases, 2024
PlantFUNCO: Integrative functional genomics database reveals clues into duplicates divergence evolution V Roces, S Guerrero, A Álvarez, J Pascual, M Meijón -Molecular Biology and Evolution, 2024
Cross-tissue human fibroblast atlas reveals myofibroblast subtypes with distinct roles in immune modulation Y Gao, J Li, W Cheng, T Diao, H Liu, Y Bo, C Liu et al. -Cancer Cell, 2024
Enhancing Immunotherapy Response Prediction in Metastatic Lung Adenocarcinoma: Leveraging Shallow and Deep Learning with CT-Based Radiomics C Masson-Grehaigne, M Lafon, J Palussière, L Leroy et al. -Cancers, 2024
A general framework for implementing distances for categorical variables M van de Velden, AI D'Enza, A Markos, C Cavicchia -Pattern Recognition, 2024
Early transcriptional similarities between two distinct neural lineages during ascidian embryogenesis RR Copley, J Buttin, MJ Arguel, G Williaume et al. -Developmental Biology, 2024
The power of visualizing distributional differences: formal graphical n-sample tests K Konstantinou, T Mrkvička, M Myllymäki -Computational Statistics, 2024
cisDynet: An integrated platform for modeling gene‐regulatory dynamics and networksT Zhu, X Zhou, Y You, L Wang, Z He, D Chen -iMeta, 2023
Children's social networks in developmental psychology: A network approach to capture and describe early social environmentsN Burke, N Brezack, A Woodward -Frontiers in psychology, 2022
Whole Genome Sequencing and Morphological Trait-Based Evaluation of UPOV Option 2 for DUS Testing in RiceH Liu, D Rao, T Guo, SS Gangurde, Y Hong, et al. -Frontiers in Genetics, 2022
Combined TCR Repertoire Profiles and Blood Cell Phenotypes Predict Melanoma Patient Response to Personalized Neoantigen Therapy plus Anti-PD-1 A Poran et al. -Cell Reports Medicine, 2020
Identification of a glioma functional network from gene fitness data using machine learning C Xiang, X Liu, D Zhou, Y Zhou, X Wang, F Chen -Journal of Cellular and Molecular Medicine, 2022
Whole genome assemblies of Zophobas morio and Tenebrio molitorS Kaur, SA Stinson, GC diCenzo -G3: Genes, Genomes, Genetics, 2023
Prediction of New Risk Genes and Potential Drugs for Rheumatoid Arthritis from Multiomics Data AM Birga, L Ren, H Luo, Y Zhang, J Huang -Computational and Mathematical Methods in Medicine, 2022
Phenotyping of acute and persistent COVID-19 features in the outpatient setting: exploratory analysis of an international cross-sectional online survey S Sahanic, P Tymoszuk, D Ausserhofer et al. -medRxiv, 2021
A two-part evaluation approach for measuring the usability and user experience of an Augmented Reality-based assistance system to support the temporal coordination of spatially dispersed teams L Thomaschewski, B Weyers, A Kluge -Cognitive Systems Research, 2021
SEDE-GPS: socio-economic data enrichment based on GPS informationT Sperlea, S Füser, J Boenigk, D Heider -BMC bioinformatics, 2018
Longitudinal analysis on the ecological dynamics of the cervicovaginal microbiome in hrHPV infectionMA Molina, WJG Melchers, et al. -Computational and structural biotechnology, 2023
Spatial and molecular anatomy of germ layers in the gastrulating primate embryo G Cui, S Feng, Y Yan, L Wang, X He, X Li, et al. -bioRxiv, 2022
Evacuees and Migrants Exhibit Different Migration Systems after the Great East Japan Earthquake and TsunamiM Hauer, S Holloway, T Oda – 2019
Robust comparison of similarity measures in analogy based software effort estimationP Phannachitta -11th International Conference on Software, 2017
RUNIMC - An R-based package for imaging mass cytometry data analysis and pipeline validation L Dolcetti, PR Barber, G Weitsman, S Thavarajet al. -bioRxiv, 2021
Expression variation analysis for tumor heterogeneity in single-cell RNA-sequencing dataEF Davis-Marcisak, P Orugunta et al. -BioRxiv, 2018
Concept acquisition and improved in-database similarity analysis for medical dataI Wiese, N Sarna, L Wiese, A Tashkandi, U Sax -Distributed and Parallel Databases, 2019
Dynamics of Vaginal and Rectal Microbiota over Several Menstrual Cycles in Female Cynomolgus MacaquesMT Nugeyre, N Tchitchek, C Adapen et al. -Frontiers in Cellular and Infection Microbiology, 2019
Inferring the quasipotential landscape of microbial ecosystems with topological data analysisWK Chang, L Kelly -BioRxiv, 2019
Shifts in the nasal microbiota of swine in response to different dosing regimens of oxytetracycline administrationKT Mou, HK Allen, DP Alt, J Trachsel et al. -Veterinary microbiology, 2019
The Patchy Distribution of Restriction–Modification System Genes and the Conservation of Orphan Methyltransferases in HalobacteriaMS Fullmer, M Ouellette, AS Louyakis et al. -Genes, 2019
Genetic differentiation and intrinsic genomic features explain variation in recombination hotspots among cocoa tree populationsEJ Schwarzkopf, JC Motamayor, OE Cornejo -BioRxiv, 2019
Metastable regimes and tipping points of biochemical networks with potential applications in precision medicineSS Samal, J Krishnan, AH Esfahani et al. -Reasoning for Systems Biology and Medicine, 2019
Genome‐wide characterization and developmental expression profiling of long non‐coding RNAs in Sogatella furciferaZX Chang, OE Ajayi, DY Guo, QF Wu -Insect science, 2019
Development of a simulation system for modeling the stock market to study its characteristicsP Mariya – 2018
The Tug1 Locus is Essential for Male FertilityJP Lewandowski, G Dumbović, AR Watson, T Hwang et al. -BioRxiv, 2019
Microbiotyping the sinonasal microbiomeA Bassiouni, S Paramasivan, A Shiffer et al. -BioRxiv, 2019
Critical search: A procedure for guided reading in large-scale textual corporaJ Guldi -Journal of Cultural Analytics, 2018
A Bibliography of Publications about the R, S, and S-Plus Statistics Programming LanguagesNHF Beebe – 2019
Improved state change estimation in dynamic functional connectivity using hidden semi-Markov modelsH Shappell, BS Caffo, JJ Pekar, MA Lindquist -NeuroImage, 2019
A Smart Recommender Based on Hybrid Learning Methods for Personal Well-Being ServicesRM Nouh, HH Lee, WJ Lee, JD Lee -Sensors, 2019
Cognitive Structural AccuracyV Frenz – 2019
Kidney organoid reproducibility across multiple human iPSC lines and diminished off target cells after transplantation revealed by single cell transcriptomicsA Subramanian, EH Sidhom, M Emani et al. -BioRxiv, 2019
Multi-classifier majority voting analyses in provenance studies on iron artefactsG Żabiński et al. -Journal of Archaeological Science, 2020
Identifying inhibitors of epithelial–mesenchymal plasticity using a network topology-based approachK Hari et al. -NPJ systems biology and applications, 2020
Genetic differentiation and intrinsic genomic features explain variation in recombination hotspots among cocoa tree populationsEJ Schwarzkopf et al. -BMC Genomics, 2020
Enhancing Card Sorting Dendrograms through the Holistic Analysis of Distance Methods and Linkage Criteria. JA Macías -Journal of Usability Studies, 2021
Pattern-based identification and mapping of landscape types using multi-thematic data J Nowosad, TF Stepinski -International Journal of Geographical Information, 2021
Motif Analysis in k-mer Networks: An Approach towards Understanding SARS-CoV-2 Geographical ShiftsS Biswas, S Saha, S Bandyopadhyay, M Bhattacharyya -bioRxiv, 2020
Motif: an open-source R tool for pattern-based spatial analysis J Nowosad -Landscape Ecology, 2021
New effective spectral matching measures for hyperspectral data analysis C Kumar, S Chatterjee, T Oommen, A Guha -International Journal of Remote Sensing, 2021
Innovative activity of Polish enterprises–a strategic aspect. The similarity of NACE divisions E Bielińska-Dusza, M Hamerska -Journal of Entrepreneurship, Management and innovation, 2021
Multi-classifier majority voting analyses in provenance studies on iron artefacts G Żabiński, J Gramacki et al.-Journal of Archaeological Science, 2020
Unraveling the record of a tropical continental Cretaceous-Paleogene boundary in northern Colombia, South America F de la Parra, C Jaramillo, P Kaskes et al. -Journal of South American Earth Sciences, 2022
A roadmap to reconstructing muscle architecture from CT dataJ Katzke, P Puchenkov, H Stark, EP Economo -Integrative Organismal Biology, 2022
Pandemonium: a clustering tool to partition parameter space—application to the B anomalies U Laa, G Valencia -The European Physical Journal Plus, 2022
Identification of a glioma functional network from gene fitness data using machine learning C Xiang, X Liu, D Zhou, Y Zhou, X Wang, F Chen -Journal of Cellular and Molecular Medicine, 2022
Cross compatibility in intraspecific and interspecific hybridization in yam (Dioscorea spp.) JM Mondo, PA Agre, A Edemodu et al. -Scientific reports, 2022
A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R J Rainer, A Vicini, L Salzer, J Stanstrup et al. -Metabolites, 2022
Single-Cell Transcriptome Integration Analysis Reveals the Correlation Between Mesenchymal Stromal Cells and Fibroblasts C Fan, M Liao, L Xie, L Huang, S Lv, S Cai et al. -Frontiers in genetics, 2022
Phenotypic regionalization of the vertebral column in the thorny skate Amblyraja radiata: Stability and variation F Berio, Y Bayle, C Riley, O Larouche, R Cloutier -Journal of Anatomy, 2022
Community assembly during vegetation succession after metal mining is driven by multiple processes with temporal variation T Li, H Yang, X Yang, Z Guo, D Fu, C Liu, S Li et al. -Ecology and evolution, 2022
Integrative Organismal Biology J Katzke, P Puchenkov, H Stark,EP Economo - 2022
Optimizing use of US Ex-PVP inbred lines for enhancing agronomic performance of tropical Striga resistant maize inbred lines ARS Maazou, M Gedil, VO Adetimirin et al. -BMC Plant Biology, 2022
I would be very happy to learn more about potential improvements of the concepts and functionsprovided in this package.
Furthermore, in case you find some bugs or need additional (more flexible) functionality of partsof this package, please let me know:
https://github.com/drostlab/philentropy/issues
or find me ontwitter: HajkDrost
About
Information Theory and Distance Quantification with R