Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
BMC journals have moved to Springer Nature Link.Learn more about website changes.
Springer Nature Link
Log in

Contrastive learning-based multi-mechanism disentangled assessment for drug-drug interaction

You have full access to thisopen access article

BMC Bioinformatics Aims and scope Submit manuscript

    Abstract

    Background

    Polypharmacy’s ability to circumvent acquired resistance to single drug makes it a critical strategy for treating complex diseases. However, it inevitably carries risks ofdrug-druginteractions (DDIs) that may alter pharmacological activities and potentially lead to severe adverse events or mortality. Computational assessment of drug combination has emerged as an effective approach to support clinical decision-making. Current risk identification methods focus on mining historical interaction patterns to uncover underlying mechanisms, yet face challenges from data sparsity. While data augmentation strategy can mitigate such problem, conventional approaches often introduce noise that obscures core pharmacological mechanisms, undermining safety evaluation.

    Results

    This study proposes aMulti-MechanismDisentangledDrug-drugInteraction assessment framework integrated contrastive learning, MMDDI, which includes two key components: (1) biologically-informed multi-view generation that creates high-quality augmented views, effectively addressing semantic distortion during data augmentation; (2) Mechanism-aware disentanglement that incorporates mutual information constraints to isolate interaction mechanisms from coupling of multi-modal and heterogeneous data, eliminating quantification bias. Contrastive learning integrates labeled and unlabeled data to enhance robustness against sparse observations.

    Conclusions

    Comprehensive evaluations demonstrate that MMDDI with hit@4 of 0.86 outperforms the compared baselines, with ablation studies validating the critical contributions of multi-view contrastive and mechanism disentanglement. MMDDI continues to demonstrate excellent performance in cold-start scenarios, achieving accuracy of 0.94 and recall of 0.95. Clinically, MMDDI enables interpretable causal analysis of drug interaction pathways through its mechanism-aware representations, providing operability for optimizing therapeutic regimens.

    View this article's peer review reports

    Similar content being viewed by others

    Explore related subjects

    Discover the latest articles, books and news in related subjects, suggested using machine learning.
    Use our pre-submission checklist

    Avoid common mistakes on your manuscript.

    Introduction

    Polypharmacy has emerged as prevalent therapeutic strategy for managing complex or coexisting diseases due to their ability to alleviate resistance to single drug in patients (especially cancer patients). However, distinct substructures with chemical properties in drug may interact with each other, thus altering the drug efficacy, a situation that can increase risk of death [1]. Ideally, co-medication should be synergistic and toxicity-reducing, nevertheless, a review of previous literature revealed that interactions result in the majority of adverse reactions. Recent statistics from a hospital in the US showed that serious adverse drug reactions(ADRs) were found in 6.7% of hospitalized patients, with mortality rate 0.32% [2]. Thus, accurate evaluation of DDI risk is crucial for optimizing clinical therapeutic regimens [3]. However, detecting DDIs remains challenging, which traditional experimental methods are often costly and inefficient. It’s necessary to develop novel computational methods that can assist DDI prediction. In drug design and clinical diagnosis, risk identification aims to capture potential interactions by excavating information and patterns from the history of ADRs. In this light, the more DDIs we know, the better we can take effective measures to prevent ADRs. Early, data mining methods based on medical literature and clinical records [4,5,6], in which NLP techniques are mainly used to extract information about drug interactions, then parse trees and logic rules are used for prediction based on extracted interactions between new and existing drugs. The established logic rules are based on substructure similarity, yet many dissimilar drug pairs may still share common substructure unrelated to DDI [7]. With the shift towards structuring medical records, deep learning-based methods demonstrate significant efficacy in DDI prediction [8,9,10]. SSI-DDI [8] uses multilayer graph attention network (GAT) to efficiently extract drug substructures, then constructs substructure-substructure interaction SSI enabling detailed analysis of specific components of drug pairs, and predicts potential DDI. MSAN [11], from perspective of chemical structure altered by inducer/inhibitor, discards some substructures to simulate intensity of action between drug substructures. To some extent, the abovementioned methods optimize the screening and evaluating process of drug candidates.

    These computational models can be divided into two categories: methods based on multi-source information fusion and those based on contrastive learning. A prominent approach within the first category integrates information from multiple scales [12,13], modalities [14,15,16,17], and external knowledge bases [18], thus providing solution to the limitations of single-source data modeling in meeting modern accuracy requirements. MUFFIN [12] utilizes molecular structure of drugs and semantic information from knowledge graphs, respectively, to improve DDI prediction accuracy with multi-scale feature fusion. DDIMDL [10] ensembles DNN submodels based on drug SMILES, target, enzyme and pathway, and learns cross-modal representations of drug pairs to predict ADRs. DAS-DDI [17] further integrates drug molecular graph modal information, proposing a dual-view framework with drug association and drug structure. HetDDI [18] leverage rich structural information such as drugs and external biological knowledge to pre-train heterogeneous information network models. However, the constructed DDI graph based on limited labeled data is enormous and sparse, which greatly increases the computational complexity. And deep convolution can extract features while obscuring distinctions between individual features.

    In contrastive learning-based DDI prediction methods, PHGL [14] applies three augmentation methods: atom masking, bond deletion and subgraph removal to drug molecular graph, which tailors a large mount of unlabeled data so that pretraining model to learn representative molecular structural features. Zhang et al. [19,20,21] performedcontrastivelearning(CL) of learned drug local features and global features respectively. Nonetheless, such learning paradigm cannot essentially solve the sparsity of interaction data, and fails to fully explore the potential patterns of drug-drug interactions. Therefore, Lin et al. [13,22,23] employed CL for extremely unbalanced DDI types to significantly improve performance in predicting rare types with fewer samples. Utilizing stochastic edge partitioning, Wang et al. [15] decomposed the DDI graph into local subgraphs, where a graph encoder performed CL in the subgraph representation space to yield comprehensive drug embeddings.

    However, several challenges still remain in existing studies. Drug representations obtained by supervised learning based on limited labeled data may be suboptimal. And multiple interaction mechanisms between drugs were not decoupled successfully [24]. It is hard or expensive to acquire data labels in many practical applications, whereas CL performs as a solution can improve model excellently with fewer labels. Experts in medicine have explained the internal mechanism of ADRs: drugs contain a mixture of constituents, chemical reactions such as redox, polymerization and oxidation will occur during the pharmaceutical process, the same is true for the DDI. Such as the co-medication in the patient’s body appeared in the single drug does not occur in the components, the toxic content of the medication is solubilized and lead to increased concentrations, P450 enzyme system is inhibited or induced so that the blood concentration changes, affecting the rate of metabolism and excretion eventually. DDI is mainly manifested in pharmacokinetics, which means that one drug alters the absorption, distribution, metabolism, and excretion of another drug, thus increasing or decreasing the blood concentration and target drug concentration. Related studies [25,26] summarized three main mechanisms of drug action: transport, metabolism, and plasma protein binding processes, and explained in detail the evaluation methods of different mechanisms. Guan et al. [26] designed ADMET-score to evaluate drug-likeness of a compound by combining 18 drug properties involving five aspects including ADMET. Therefore, disentangling the different aspects of drugs’ action would help physicians to take timely precautions or examine patient’s status. DDIMDL modeled the operational principles ADRs as singular aspect, ignoring the fact that DDI is a result of more than one. When modeling highly coupled mechanisms, SSI-DDI assigns importance weights skewed from expectation, so that obscure dominant mechanisms and may mislead clinical decisions.

    To address above-mentioned challenges, we proposes MMDDI, a multi-mechanism disentangled DDI assessment framework based on CL. First, MMDDI designs deep auto-encoding module that extracts information from both labeled and unlabeled data to embed these patterns into latent space. Specifically, two data augmentation operators with biological interpretation are proposed for creating different views to drug-pairs in CL module, which enrich the training data. To deal with the latter, we seek to clearly decouple DDI mechanisms. To be specific, mutual information is utilized as regularity of independence to encourage different aspects to learn information separate from each other. As such, it promotes encoder to capture semantic information well, providing interpretability for drug design and clinical consultation. Summarizing the main contributions of this paper as follows:

    1. (1)

      We propose MMDDI, a mechanism-aware joint medication risk assessor, whose multi-view contrastive learning module is equipped with biologically meaningful augmentation strategies and ultimately enhances model’s robustness.

    2. (2)

      We also develop specialized decoupling module for explicitly disentangling functioning mechanism responsible for occurrence of ADRs. It encourages specific mechanism to be able to learn characteristics from DDI samples independent of others, removing interpretability bias.

    3. (3)

      Experiments on two real datasets of different size and sparsity show that MMDDI obtains superior performance compared to baselines. Further ablation studies show that the benefit stems from the multi-view contrastive learning module and the decoupling module. In the additional DDI prediction task, MMDDI maintains outstanding performance in cold-start scenarios, achieving 93.85% accuracy and 94.96% recall.

    Discussion

    Contrastive learning

    Earliest proposed in computer vision field, contrastive learning(CL) improves the number of samples in the training set by processing original image through flip, fog, and crop, each augmentation can transform data stochastically with some internal parameters (e.g. rotation degree, noise level). Further, these images with transformations are fed to train objective recognition model. Such data augmentation operators can improve robustness and generalization, and overcome model’s dependence on labeled data [27,28,29]. Li et al. [30,31,32] tackle data sparseness in sequential recommendation that predict the next item user may click on, introducing five operations(crop, mask, disrupt, substitution, and insertion), and randomly choosing two separate views to form positive pair. The principle of CL is that driving model to learn well characterized features by comparing the differences between views. Recently, inspired by self-supervised learning, a series of studies have emerged to apply CL to learn enhanced drug-pair representations in DDI prediction tasks with limited labeled data. CL consists of two primary components: data augmentation for boosting training data and contrastive loss used as self-supervised signal. Based on this, as illustrated in Fig.1, Yuan et al. [14] proposed three augmentation strategies for molecular graph. Zhuang et al.(2023) [33] assumed that local and global features share similar semantic and proposed ADGCL that utilizes graphCL to maximize agreement between local and global representation of drug molecules, while the irrelevant semantic is minimized, as well as incorporates microsupervised learning to mitigate category imbalance. MDDI-SCL [13] encodes multi-omics similarity and takes CL in self-supervised manner to embed samples of the same type closer to each other in the latent space, which promotes performance in more fine-grained fashion. DSN-DDI [9] introduces a novel multi-view drug substructure framework that learns substructures from individual drug views and updates drug representations from drug-pair views. Whereas the practice allows for in-depth understanding of drug properties as well as drug interactions, deep graph convolution operation blurs differences between substructures. Although CL reveals new prospects, the technical perspective of augmentation has not been considered in drug interaction studies. In a recent study, Wang et al. [15] decomposes the DDI graph into multiple local subgraphs via stochastic edge partitioning and performs CL in the latent space. Although this approach reduces the computational complexity, it lacks semantic support and fails to address the performance degradation in inductive setting. MSAN [11] proposed substructure discarding with consideration that substructure changes may affect drug interactions and simulated the strength of interactions between substructures of two drugs based on similarity. The method provides an effective and intuitive way to understand DDI by directly incorporating the molecular structure of drugs. Departing from standard practice, this paper proposes two augmentation strategies, flip and shuffle, for overcoming the limitation that previous practice is difficult to retain core semantics, and ensuring interpretability of the augmented view.

    Fig. 1
    figure 1

    Main contrastive learning frameworks applied in DDI prediction

    Decoupled representation learning

    In the field of recommendation systems, scholars conduct recommendation algorithm research based on user-item historical interaction. It is widely believed that multiple underlying motivations are coupled behind the user’s behavior. Ma et al. [34] proposed MacridVAE, the first attempt to incorporate decoupled representation learning into behavioral history of users at macro and micro levels. Subsequently, sequence-to-sequence training strategy was proposed to extract additional supervised learning signals in the uncoupled potential space [35,36,37]. Sha et al. [36,38,39] argued that users make social connections with others and consume items with different motivations, which reflect user’s subjective interest in carrying out the interaction, e.g., user may interact with an item because of its specifications, layout, and shelf-expiration date, and so forth. When user wants to buy bread, shelf life may be main motivation, i.e., different motivations contribute differently to item recommendation. Therefore, we define disentanglement as detaching the integral mechanisms/motivations behind a variable into combinations of multiple independent variables, and similarly for multi-variable linear regression, which also requires that variables should be isolated from each other. Consistent with this paradigm, similarly, most DDI prediction and co-administration risk identification predominantly leverage drug-drug interaction or drug-substructure co-occurrence matrices, and existing works ignore the fact that the occurrence of ADRs may be a consequence of multiple underlying mechanisms [24,40]. For example, some chemicals may exhibit diverse mechanisms depending on their structures, Hao et al. [41] addressed this issue by developing MeTDDI, an approach that is capable of probing the main mechanisms of enzyme inhibition of chemicals, which can aid medicinal chemists in identifying metabolic sites and thus better understanding DDI’ occurring. As shown in Fig.2, drug A may potentially have adverse effects by affecting the absorption, distribution, metabolism and excretion process of drug B. Joint use of drugs is risky for reasons that glycemic changes during metabolism, distributional changes of drugs due to changes in enzyme activities in different organs, and the risk of changes in the activity of drugs competing for binding sites during absorption [24], etc. In addition, the dose, frequency and delivery route of administration may also influence DDI, leading to safety concerns. According to Peng et al. Peng et al. [25], CYPs-mediated metabolism and transporters-regulated cellular uptake constitute the dominant biological pathways underpinning clinically significant DDIs Highly engangled is detrimental to the interpretability of drug design and physician practice diagnosis. For example, assume that DDI mechanism is decoupled into metabolism\( D_{m} \) and absorption\( D_{a} \) :

    Fig. 2
    figure 2

    Toy example of multi-facet mechanisms behind drug-drug interaction network

    $$ \begin{gathered} D_{a}^{{\prime \:}} = 0.6D_{a} + 0.4D_{m} ,D_{m}^{\prime } = 0.4D_{a} + 0.6D_{m} , \hfill \\ D_{{fuse}}^{\prime } = 0.8D_{a}^{{\prime \:}} + 0.2D_{m}^{{\prime \:}} = 0.56D_{a} + 0.44D_{m} , \hfill \\ \end{gathered} $$
    (1)

    Where\(\:{\:D}_{m}^{{\prime\:}}\) and\(\:{D}_{a}^{{\prime\:}}\) denote the uncoupled representation, respectively. The final fusion weights of 0.56 and 0.44 are significantly different from expected 0.8 and 0.2. Therefore, dissociation enhances interpretability, and clarifying the dominant mechanism is more informative for clinicians’ decision-making.

    Accordingly, the most direct guidance of disentangling at pharmacodynamic mechanisms in conjunction with multi-omics drug profiles is that we attempt to disentangle ADRs rather than aggregation. This approach clarifies dominant clarifying the dominant pharmacological aspect and eliminates interpretability bias inherent in prior studies, thereby preventing misrepresentation of co-administration safety .

    Methods

    Problem formulation

    In our task, each drug is encoded with functional substructures derived from SMILES using RDKit [42]. The dataset for this study consists of the following three sets:(a) a set of SMILES-formatted stringsS, which is used to generate drug-embedding tables\( E_{1} = \left[ {e_{1} , \ldots ,e_{i} , \ldots ,e_{n} } \right] \) and\( E_{2} = \left[ {e_{1}^{\prime } , \ldots ,e_{i}^{\prime } , \ldots ,e_{n}^{\prime } } \right] \),\( e_{i} \) denotes drugi similarity features, and\( e_{i}^{\prime } \in \left\{0,1\right\}^{d} \)is a multi-hot vector and represents the substructure composition of drugi, theh-th item of\( e_{i}^{\prime } \)is 1 if drugi contains substructureh,i=1,…,n,h=1,…,d, wheren is the total number of drugs andd is the total number of RDKit-derived substructures;(b) a drug pair set of known drug-drug interactionsI={(DrugA,DrugB)|DrugA,DrugB\( \in S \)},\( m = |I| \);(c) a set of drug pairsF=(S\( \times \)S)\I, which consists of drug pairs whose interactions have not been reported,\(\tau = |F| \). In other words, known DDIs in the dataset are considered as positive samples, while unknown DDIs are treated as negative samples. Thus, for a given drugDrugA,\( S_{{DrugA}}^{ + } \) is defined as the set of drugs interacting withDrugA, oppositely,\(S_{{DrugA}}^{-} \) is defined as the set of drugs without known interactions withDrugA. Unlike determining whether there exists interaction between drugs, co-medication risk identification is primarily based on drug-pair attributes, utilizing labeled and unlabeled data to constitute triple set\( D = < DrugA,DrugB_{i} ,DrugB_{j} > DrugB_{i} \in S_{{DrugA}}^{ + } , \)   

    \(DrugB_{j} \in S_{{DrugA}}^{-} \). Thereby, the sampled drug pair\( (DrugA,\:DrugB_{i} ) \in I \) is called as positive sample, and the sampled drug pair\( (DrugA,\:DrugB_{j} ) \in F \) is called as negative sample. A candidate drug needs to be evaluated for its potential to be a perpetrator (inhibits or induces enzymes or transporters) or victim (whose pharmacokinetic is changed by a perpetrator) of DDIs with likely co-administered drugs. According to the input drug, the system outputs: (i) a prioritized list of high-risk DDIs with (ii) associated risk quantification coefficients, thereby supporting evidence-based avoidance strategies. The mathematical formulation is defined as Eq. (2). The output of\( {\text{Prob}}(\cdot) \) serves as a risk quantification coefficient, typically implemented by sigmoid function that maps the input to the probability space.

    $$ {\text{argsort}}\left( {{\text{Prob}}\left( {\left. {DrugB \in F_{{DrugA}} } \right|\left\langle {DrugA,DrugB_{i} ,DrugB_{j} } \right\rangle } \right)} \right){\text{ }}[:topk], $$
    (2)

    Drug association encoder and drug-pair encoder

    It is well known that drugs with similar chemical structures may exhibit similar activities and that similarity potentially symbolize valid information about DDI. Furthermore, similarity provides richer context for each drug, which helps to capture more complex inter-drug relationships [13,17]. Inspired by this, we use multi-hot encoding to express drug attribute composition regarding various distinct attribute categories, then design drug multi-attribute features to help the model extract latent cross-drug associations. Unlike post-fusion adopted by DDIMDL, we concatenates all features as unified DNN inputs [43], which has been proved to be an effective behavior by Deng et al. [10]. Subsequent processing employs a position-embedding-free transformer encoder(comprising two sub-layers) to exhaustively distill information embedded within the feature vectors. The integrated multi-head attention (MHA) mechanism explicitly models dependencies between drug pairs. The\(Tanimoto\) [44] coefficient is calculated by Eq. (3) to measure attribute composition similarity between drugs.

    $${Tanimoto}(S_{i} ,\:S_{j} )^{{(\omega )}} = \frac{{|\:S_{i}^{{(\omega )}} \cap S_{j}^{{(\omega )}} |}}{{|S_{i}^{{(\omega )}} \cup S_{j}^{{(\omega )}} |}} $$
    (3)

    Where\( \omega \in \{ substructure,\:target,\:enzyme,\:pathway\} \: \)denotes attribute category,\( S_{i}^{{(\omega )}} \) and\( S_{j}^{{(\omega )}} \) represent the attribute composition vector of\(\:{Drug}_{i}\) and\(\:{Drug}_{j}\) in terms of attribute category\(\:\:\omega\:\) respectively.\( \left| {S_{i}^{{(\omega )}} \cap S_{j}^{{(\omega )}} } \right| \) is the intersection size of\(\:{Drug}_{i}\) and\(\:{Drug}_{j}\) attribute composition vectors,\( \left| {S_{i}^{{(\omega )}} \cup S_{j}^{{(\omega )}} } \right| \) denotes the union size of\( Drug_{i} \)and\( Drug_{j} \) attribute composition vectors. In essence,\( {\text{Tanimoto}}(S_{i} ,S_{j} )^{{(\omega )}} \) indicates Intersection over Union of\(\:{Drug}_{i}\) and\(\:{Drug}_{j}\) attribute composition in terms of attribute category\(\:\text{ω}\). Four attribute composition similarities between\(\:{Drug}_{i}\) and\(\:{Drug}_{j}\) are concatenated horizontally to form initial drug-pair embedding. The initial drug-pair embedding is then fed separately into two-layer auto-encoder. Therefore, MMDDI is allowed to extract information from both labeled and unlabeled drug pairs and embed these patterns into latent space.

    $$z = \text{F}^{{(l)}} (\text{BN}(\sigma(W_{e} x))),\text{ }\widehat{x} = \text{F}^{{(l)}} (\text{BN}(\sigma(W_{d} z))), $$
    (4)
    $$ \mathcalligra{L}_{{re}} (x,\widehat{x}) = \frac{1}{d}\sum\nolimits_{{i = 1}}^{d} {\left( {x_{i} ,\:\widehat{x}_{i} } \right)^{2} } $$
    (5)

    In encoder of CL module, dependencies among drug attribute categories are fully captured with MHA, drug pair interaction patterns are projected into the same latent space to manifest different contributions of substructures and extract the dependencies between frequent substructures. As illustrated in Eq. (4),\( x \in {\mathbb{R}}^{{n \times d}} \)is constructed by\( Concat(e^{s} ,e^{t} ,e^{e} ,e^{p} ) \),\( W_{e} \in {\mathbb{R}}^{{k \times d}} \;{\text{and}}\;W_{d} \in {\mathbb{R}}^{{d \times k}} \:(d\ll k) \) are trainable weight matrices, BN [45] denotes batch normalization, which solves Internal Covariate Shift within model, and was shown to improve performance when added to network layers. DDI is mainly caused by few functional substructures, while the rest is less relevant. Therefore, feature compression is used to remove redundant information, minimizing Eq. (5) to ensure that important information was maximally restored and alleviate memory pressure.

    Fig. 3
    figure 3

    Framework of MMDDI model. MMDDI comprises five main components: (A) The drug encoding module, which corresponds to two entrances of the model, including drug association encoder and drug-pair encoder. The former encodes similarities of the specific drug to other drugs as features, while the latter is multi-hot encoding, with index values of 0 and 1 used to indicate whether the drug molecule contains the substructure or not. (B) The multi-view contrastive learning module incorporates flip and shuffle augmentation operators to generate biologically coherent views. Schematic illustrations further detail the embedding alignment process between positive sample pairs and negative counterparts. (C) The decoupling and risk assessment module, disentangles DDI mechanism into\(\:\text{K}\) discrete pharmacological aspects, which are then fused to output risk value through multi-head attention. (D) Base encoder, which employ two sub-layer of transformer structure but without positional encoding. (E) Dense layer.

    Multi-view contrastive learning module

    Current research on DDI based on CL not consider augmentation from technical view, so this paper introduces two biological simulation strategies, flip and shuffle, for constructing positive pairs, so as to overcome the difficulties in retaining core semantics. The SPM [7] algorithm was used to extract frequent substructures from SMILES, encoding them into fixed-dimensional vectors. This section starts with how to construct the augmented view and how to definite the positive and negative pairs, followed by the contrastive loss function.

    • (a) Flip. Flip 0 to 1, which means that DDIs change the substructure contained in the drug.\(x = [v_{1} ,v_{2} ,...,v_{n} ],v_{i} \in \left\{0,1\right\}\)if\( \mathop \sum \limits_{{k = 1}}^{n} 1~[v_{k} = 0] \ge n\alpha \), Then\(n\upalpha\:\) is taken randomly from the index that takes value 0 and flipped to 1, where\(\:{\upalpha\:}\) is hyper-parameter named flip rate. We assume that DDI involves chemical reaction that produces new substance. Such strategy could improve accuracy and recall rate of risk identification theoretically, because the enhanced view indicates that the drug possesses new function groups, which increase the likelihood of adverse effects occurring.

    • (b) Shuffle. Randomly selecting the starting index in drug vector and shuffling the segments with consecutive lengths of\(n\upalpha\:\). The operation is useful for removing interference from feature alignment by ensuring total count of 1 remains unchanged. Sample\(\:{k}\) from\( [1,~2,~...,~n - n\alpha + 1] \)as start index,\( ~shuffle([k,~k + n\alpha - 1]) \) [18]. utilized transformer without positional encoding, which does not fully consider randomness of the substructure arrangement. As a result, the view created by shuffle enhances robustness to different permutations of substructures that combines the bidirectional advantages of the BiRNN-DDI [16].

    Augmentation strategies provide more opportunities for the non-popular drug-pairs to learn. The augmented vectors are further embedded into low dimension for comparison via encoder-decoder, and theInfoNCE Loss [28] is minimized to maximize the mutual information(MI) between the positive sample pairs. Specifically,MI, illustrated by Eq. (6), is defined to measure interdependence between random variables. It measures the degree to which the uncertainty ofY is mitigated given the informationX, which is ultimately transformed into KL divergence between the product of the joint distribution and its marginal distribution. Here,X andY represent the overall variables of the outputs from the two augmented views.

    $$\: \begin{aligned} MI(X,Y) = & H(Y) - H(Y|X) = \sum\nolimits_{{x,y}} {p(x,y)} \log \frac{{p(x,y)}}{{p(x)p(y)}} = D_{{KL}} \left( {\left. {p(x,y)} \right\|p(x,y)} \right) \\ \end{aligned} $$
    (6)

    Contrastive learning loss applied as objective function to maximize the consistency of anchor nodes with positive instances, and minimize the similarity of anchor nodes to negative samples. The last layer of CL uses ReLU activation function, ReLU produces sparse representations, which further facilitates easier interpretation of the codes.

    $$\: \begin{aligned} \mathcalligra{L}_{{InfoNCE}} = & - \log \frac{{\exp (sim(q,x^{ + } ))}}{{\sum\limits_{{i = 0}}^{N} {\exp (sim(q,x_{i} ))} }} \\ & = - \log \frac{{\exp (\log p(x^{ + } ,q))}}{{\sum\limits_{{i = 0}}^{N} {\exp (\log p(x_{i} ,q))} }} \\ & = - \log \frac{{p(x^{ + } ,q)}}{{\sum\nolimits_{{i = 0}}^{N} {p(x_{i} ,q)} }} \\ \end{aligned} $$
    (7)
    $$ sim(q,x) = - E(x,q) = \log p(x,q) $$
    (8)

    Hypothesized that positive sample pairs\( p(x^{ + } ,\:q) \) are from the joint distribution, while the negative samples are independent samples based on the marginal distribution\(p(x^{-} ) \) and\(\ p(q) \)[28,46]. Equation (6) optimizes the MI between the positive and negative by maximizing the joint probability of positive sample and minimizing edge probability of negative sample, which is refined by Eqs. (89), and Fig. 3 shows more details of the process.

    $$\sum\nolimits_{{i = 0}}^{N} {p(x_{i} ,q) \approx p(x^{ + } ,q) + N\cdotp(x^{-} )p(q)} $$
    (9)
    $$ \begin{aligned} \mathcalligra{L}_{{InfoNCE}} \approx & - \log \frac{{p(x^{ + } ,q)}}{{N\cdotp(x^{0} )p(q)}} \\ & \approx - E_{{p(x,q)}} \left[ {\log \frac{{p(x,q)}}{{p(x)p(q)}}} \right] \\ \end{aligned} $$
    (10)

    Finally, in the specific implementation, the loss is transformed into the form of Eq. (10), where both\( h_{i} \) and\( h_{j} \) represent the latent representations of drug-pair output by the CL module.

    $$ {\mathcalligra{L}}_{{InfoNCE}} = - \log \left( {\frac{{\exp (h_{i} \cdot h_{j} /t)}}{{\sum\nolimits_{{h^{0} \in H_{i}^{ - } }} {\exp (h_{i} *h^{0} /\tau )} + \sum\nolimits_{{h^{ + } \in H_{i}^{ + } }} {\exp (h_{i} \cdot h^{ + } /\tau } )}}} \right) $$
    (11)

    Decoupling and risk assessment module

    Microscopically, reactions between two drugs usually involve interactions between their substructures, which are closely related to their biological activities. And it helps in potential risk assessment to understand the mechanism of action when two drugs are combined and clearly unpacked at macro. Thus, we need to associate interaction mechanism meaning to each disentangled module. The CYPs, which constitute the major enzyme family capable of catalyzing oxidative biotransformation, are of high clinical relevance for DDIs. The high permeability drugs, mainly eliminated by metabolism, are often involved in CYPs-mediated DDIs. The poorly permeable drugs, predominantly eliminated by renal and/or biliary excretion, often suffer from transporter-mediated DDIs. For different decoupled aspects, their contributions are assigned through MHA. However, previous studies [3] for the decomposition of the mechanism are accomplished by simple segmentation through neural networks without adding explicitly independent regularity, resulting in biased interpretability.

    Therefore, in our decoupling module and risk assessment module, KL divergence is applied as regularization to ensure different action features\(p\) and\(q\) derived from decoupling are independent of each other.\( K \) is the number of aspects.

    $$\begin{aligned} D_{{KL}} (p,\:q) = & \sum\nolimits_{i} {p(x_{i} )} (\log p(x_{i} ) - \log q(x_{i} ) )= E_{{x\sim p}} \left( {\log \frac{p}{q}} \right) \geq 0 \\ \end{aligned} $$
    (12)

    MMDDI maps each input sample to risk value, risk probability scores of positive and negtive samples used to calculate BPR pairwise loss, which backfires to update model parameters. BPRloss is commonly implemented for binary preference ranking, in particular for processing implicit data, where the goal is to maximize the log-likelihood of the difference in scores of the positive samples relative to the negative samples:

    $$ Loss_{{bpr}} = - \frac{1}{m}\sum\nolimits_{{k = 1}}^{m} {\log s(\widehat{r}_{{(k,\:i)?I}} \: - \:\widehat{r}_{{(k,\:j)?F}} )} $$
    (13)

    \(\sigma(x) \) is the sigmoid function that maps the score difference to [0,1], which in turn maximizes the joint probability across samples through the negative log-likelihood. The idea of applying BPR modeling in this paper is that for any\( DrugA \), the interaction\( DrugB \) corresponding to\( DrugA \) was labeled, and if\( DrugA \) acts on\( DrugB_{i} \)when both\( DrugB_{i} \)and\( DrugB_{j} \)are available, then triples\( \left\langle {DrugA,\:DrugB_{i} } \right\rangle \) are obtained, which indicates the level of risk associated with\( \left\langle {DrugA,\:DrugB_{i} } \right\rangle \) conjunction is higher than\( \left\langle {DrugA,\:DrugB_{j} } \right\rangle \).\( t \) such triples represent\( t \) training samples in total.    

    Joint training objective

    Following the most common multi-task training strategy, we had the overall loss function as follows:

    $$ \mathcalligra{L}_{{total}} = \alpha \mathcalligra{L}_{{re}} + \beta \mathcalligra{L}_{{InfoNCE}} + \gamma \mathcalligra{L}_{{ind}} + \mathcalligra{L}_{{bpr}} $$
    (14)

    Experiments

    Datasets

    A clinically relevant DDI may occur when the perpetrator affects one of the main pathway of the victim drug. DDIs arise from diverse mechanisms, including enzyme inhibition/induction, transporter-mediated processes, and substructure interactions [25]. Given their comprehensive information coverage of these mechanisms demonstrated in prior research, Dataset1 [10] and Dataset2 [22] were selected as benchmark datasets for this study. Table1 shows the details of the datasets. And both datasets provide only positive samples, negative samples are selected following the leave-one-out method in SRR-DDI [14,17,47] model. We generate negative counterparts by sampling complement set of positive drug pairs set in model training, while utilizing one-to-many strategy for more negative samples played as candidates in validation phrase.

    Table 1 Summary of datasets used in our experiments

    Pre-processing the above datasets so that they are suitable for baselines. Drugs and corresponding DDIs for which SMILES cannot be found from pubchem [48] will be excluded, single atom drugs are also removed. Finally, 540 and 1175 drugs, 34,268 and 292,046 DDIs for Dataset1 and Dataset2 were obtained, respectively. The number of drugs in dataset2 are approximately twice that of dataset1, which can validate whether MMDDI is scalable. Dataset1 was statistically analyzed and visualized as shown in Fig.4. Each drug interacted with 127 drugs on average, or 23.5%, and the short-head-long-tail distribution of DDIs suggests presence of both prevalent and cold-start drugs, which is particularly important for assessing the risk of co-administration. Structural and meta-path similarity scores between drugs were generally low, with slightly left-skewed. It indicates that limited information characterized by similarity encoder, so additional encoders are necessary.

    Fig. 4
    figure 4

    Descriptive statistical analysis of Dataset1

    Metrics

    In this paper, risk assessment for co-medication is performed in order to avoid potentially risky prescriptions, which is modeled as a top-K avoidance scenario, and hence the following metrics are applied according to user-item interaction in recommendation systems [34,35,36,38,39].

    HIT@K, serves as the top-K hit ratio metric, preserving its rank-sensitive accuracy semantics.\( DrugA \) in the test set of drug pairs as anchor, employing leave-one-out [49] to generate candidates as\( DrugB \)(i.e., one positive and randomly sampledN negative examples). The anchor is assigned an accuracy score of 1 if the positive instance ranks among the top-K predictions.

    $$ ACC@K = \frac{1}{{N_{A} }}\sum\nolimits_{{DrugA}} {[1]_{{DrugB_{{pos}} \in \arg sort(scores)[:K]}} } , $$
    (15)
    $$ NDCG@K{\text{ }} = {\text{ }}\frac{1}{{N_{A} }}\sum\nolimits_{{DrugA}} {\sum\nolimits_{{i{\text{ }} = {\text{ }}0}}^{{K{\text{ }} - {\text{ }}1}} {\frac{{r_{i} }}{{\log _{2} (i{\text{ }} + {\text{ }}2)}}} } , $$
    (16)
    $${\mathrm\:{MRR@K=}\text{ }\frac{\mathrm{1}}{N_A}\sum_{DrugA}{\frac{\mathrm{1}}{{{rank}_{pos}}\mathrm{+1}}\mathrm{\text{ .}}}}$$
    (17)

    To further reflect whether positive examples of\( DrugB \) with higher risk values appear at ahead, NDCG@K(Normalized Discounted Cumulative Gain) and MRR@K(Mean Reciprocal Rank) are introduced. Where\( r_{i} \) denotes the relevance to\(DrugA \), for simplicity, we set the relevance of positive examples to 1, otherwise to 0.\( rank_{{pos}} \) denotes rank index of in all candidates, and the lower the rank, the smaller the value, and the value of 0 when it is not in top-K list.

    When physicians or patients use co-medication prescription that has not been historically documented, MMDDI is able to assess risk of the prescription. This will assist physician to clarify the mechanism of action, ultimately determining whether the prescription is feasible and deciding intake dosage. For comparison, this article will show case study of drugs that potentially risky in combination with Fenfluramine.

    Training implementation

    Due to large number of negative samples, it is necessary to randomly sample the same number of positive samples to construct triples for training. Training process utilizes BGD from batchsize setting\( [32,~64,~128] \), and sets\( epoch = 15 \). Meanwhile, CL temperature parameter\( T = 0.05 \), learning rate\( lr = 0.00002 \), and the number of disentangling aspects\( D_{K} = 2 \) were set empirically for simplicity. Drop rate set to 0.4 and we apply 5-fold cross-validation to DDIs instead of drugs, which ensure generalization and reduce randomness caused by data division. Early stopping strategy is also applied. And all baselines follow original configuration. Figure 5 illustrates the training loss of some baselines varying with the number of backpropagation iterations. Overall, CASTER converges rapidly, MMDDI changes more smoothly, and SRRDDI suffers from fluctuations locally, but all of them tend to converge in the end.

    Fig. 5
    figure 5

    Training loss curves for MMDDI and baselines

    Baselines

    We compared MMDDI to baselines that were dealing with drug substructures and similarities, or had CL modules, respectively.

    VanillaCF: The most primitive collaborative filtering(CF) algorithm with no specific information about drugs. Recommendations are made by medication co-occurrence matrix. But it cannot cope with drug cold start.

    Similar-based [50]: Using similarity-based matrix heuristics. According to [35] it can be divided into network similarity-based method and network embedding-based method [10]. Network similarity-based method directly uses connections between drugs to calculate similarity, common similarity metrics include common neighbors, meta-paths, Jaccard coefficient, etc. Network embedding-based method maps drugs in the network to low-dimension by calculating cosine similarity. The former is suitable for simple tasks, while the latter is able to capture more complex structural information.

    PathSim1 & PathSim4 [51]: The former CF is based on the metapath similarity of substructures and is able to elaborate DDIs through substructure pairing. The latter is based on metapath similarity of multi-omics.

    NetworkSim [52]: Network similarity-based approach, is capable of capturing higher-order similarity, inferring potential DDIs through label propagation.

    NetEmb [10]: The nodes in drug homogeneous network are mapped into low-dimensional embeddings by matrix decomposition. We employ DNNs to perform DDI risk assessment, using regular terms to constrain the low-dimensional embeddings of same drug in drugA matrix and drugB matrix to be as close as possible.

    LR1 & LR4 [53,54]: Corresponding to the two inputs of MMDDI, the former uses drug structure similarity as features to predict DDI using logistic regression, while the latter uses multi-omics.

    DDIMDL [10]: Four types of drug features were used separately to construct DNN sub-models to learn cross-modal representation.

    CASTER [7]: Functional substructures that play major role are mined by SPM to effectively characterize drug. Additionally, a dictionary learning module that measures the relevance of each input substructure to the DDI results and improves interpretability of the predictions.

    MDDI-SCL [13]: Adverse events are predicted based on multi-omics similarity profile. We modified the last layer of original model setup from multi-class to single-classification that can be seen as regression.

    SRR-DDI [3]: Substructure refinement mechanism is proposed, and also utilizing multi-scale fusion.

    MMDDI learns interpretable representations, is able to offer more than just improved performance. Since metrics used in this paper are different from original article, to ensure consistency, we change the output layer of baselines to one neuron, and all other settings follow the original.

    Results

    Overall performance are presented in Table 2. The results on Dataset1 show that MMDDI obtains the optimal performance in hit rate, and the other metrics are only second to NetSim, the performance of SRRDDI based on 2D molecular structure is close to that of MMDDI, reflecting that effectiveness of mechanism disentangling in capturing the critical pathways of action. Among non-NN approaches, algorithms such as VanillaCF and NetSim, which are based only on implicit feedback of drug interactions, obtain comparable performance. Even, we found that VanillaCF performs much better than other similarity-based deep learning methods. The reason may be that structural similarity and meta-path similarity scores between interacting drugs are generally low, as shown in Fig. 4(Section Experiments), both of them are slightly left-skewed, which are not representative by encoding them as input to the neural network. VanillaCF, originally developed for recommendation systems, operates based on drug interaction co-occurrence matrices and excels at recommending popular drug combinations. In e-commerce item recommendations, this often results in excessive homogenization, failing to meet consumers’ personalized needs. In contrast, the augmentation strategies in MMDDI provide more learning opportunities for non-popular drug pairs. However, VanillaCF also has limitations. For instance, while digoxin interacts with both estrogens and thyroid preparations, VanillaCF would infer that estrogens and thyroid preparations should also interact based on the co-occurrence matrix, despite their actual pharmacological compatibility.

    Table 2 Experimental results. The highest value in each column is shown in bold. The italic values are the second-best performance baselines for each metric

    Furthermore, we observe that using all features does not lead to better results than single feature type. Results of PathSim1 and PathSim4, LR1 and LR2 illustrate that it is not the case that more detailed information improves the ability to assess risk. NN was named as black-box model, it is obvious that the more detailed information is more favorable, and the collaborative filtering algorithm fails to reflect that advantage. LR1 and LR2 validate the effectiveness of drug similarity encoding in network prediction.

    In Dataset2, the performance of baselines fused with NN all suffered decreases from different degrees, which we speculate stems from datasets’ sparsity. We split datasets into training and test based on interactions with 4:1. Both MDDISCL and MMDDI have CL module to mitigate data sparsity, and thus can show their advantages in dataset1. Despite enhanced performance in dataset2, CF algorithms are unable to handle cold-start scenarios, which greatly limits development of new drugs. Algorithms such as MMDDI can effectively avoid this shortcoming by designing a series of biological simulation processes to improve interpretability, but at the cost of performance degradation. Therefore, design of biological simulation applicable to SRRDDI or further optimization of MMDDI is needed to improve model robustness.

    Ablation study

    • To verify the impact of each component in MMDDI on its performance, five variants were designed for ablation experiments on dataset1 :

    • encoder & Fusion(Vanilla): the vanilla baseline of MMDDI, consists of only the drug encoding module and the fusion module. The fusion module concatenates the multilayer outputs of encoders at different scales.

    • Con: This variant is designed to explore performance when only CL module exists.

    • Con & De: Since MMDDI contains two entries, we examine the performance when containing the multi-view CL and decoupling module simultaneously.

    • MMDDIw/o De: In this variant the decoupling module is removed or can be seen as\( D_{K} = 1 \), similar to MDF-SA-DDI [22].

    • MMDDIw/o Con: Only CL module is removed.

    The performance of MMDDI and its variants on dataset1 is summarized in Table 3. Firstly, risk identification of drug pairs can be improved by encoding similarity, as a resultDe & Con without that encoders faced declining.Con with multi-omics feature encoding of drug-pairs better learns interaction mechanism between similar chemical substructures, and two augmentation strategies simulate this process in biological sense, with significant enhancement for improving risk warning. Further, T-SNE was employed to visualize the learned low-dimensional embeddings of samples in 2D, as shown in Fig. 6, which indicates that with CL and BPR pairwise loss exhibits excellent ability in differentiating potential DDIs. In contrast, the decoupling moduleDe contributes interpretability to MMDDI, with sacrifice in performance, but within acceptable levels.

    Table 3 Results of ablation experiments

    In short, each module’s changes and mechanisms affect the overall performance. Design of MMDDI, macroscopically carefully refining drug association features and microscopically considering potential substructure shifts, and then disentangling the mechanisms is crucial for recognizing potential DDIs and assisted decision making.

    Fig. 6
    figure 6

    Visualization of positive and negative examples in latent space

    Parameter sensitivity analysis

    Choosing of hyperparameters will affect model performance, under limited page, we mainly discuss top N and rate\(\alpha\). When we explore one of them, the others remain fixed to optimal settings. As depicted in Fig. 7(1), as batchsize increases from 32 to 128, model performance tends to increase and then stabilize, reaching the optimum when batch = 64. The effect of risky drug avoidance gradually improves with growing N. However, all metrics drop whenN = 6. We speculate that it is caused by unstable sampling strategy, and we will strive to improve in subsequent work. Flip rate\( \upalpha \) greatly affects data diversity, the MMDDI reaches optimal when\( \alpha = 0.1 \) as illustrated in Fig. 7(3). This enriches data without excessively damaging key structure information from drug SMILES, thereby improving performance. In terms of overall stability, magnitude is most affected by batchsize due to high memory requirements of MMDDI, while the flip rate has less impact on MMDDI since it increases diversity regardless of its value.    

    Fig. 7
    figure 7

    Visualization of parameter sensitivity

    Performance in DDI prediction

    To further validate the effectiveness of MMDDI in risk assessment, we employed general evaluation metrics (including Accuracy, AUC, F1-score, Precision, Recall, and AUPR) to conduct DDI classification predictions under both transductive and inductive settings. Specifically, risk coefficient threshold is set to 0.5, where drug pairs with value exceeding 0.5 were identified as having potential interactions, while others were classified as not. Three testing configurations were implemented: (S1) Based on DDIs, dataset is partitioned into training and testing sets. In this scenario, the drugs involved in the testing DDIs set are already present in the training set, which is also referred to as the transductive setting; (S2) Dataset partitioning based on drugs, dividing drugs into known and novel drug sets with 4:1, where testing drug pairs set contained one drug belonging to the novel category (i.e., no DDI records associated with this drug existed in the training set); (S3) Employing the same partitioning strategy as S2, but with both drugs in test pairs being novel. Settings S2 and S3 are collectively referred to as inductive settings, as both involve cold-start drugs. Seven advanced models from DDI prediction were selected as baselines, all of which incorporate two or more techniques from multimodal fusion, contrastive learning, graph neural networks, and disentangled representation learning. Validation results on Dataset1 are presented in Table 4, while results for Dataset2 are provided in Supplementary TableS2.

    Experimental results demonstrate that MMDDI exhibits outstanding performance across both datasets. In the transductive setting (S1), MMDDI attained 98.42% accuracy and 99.31% AUC, achieving comparable performance to DAS-DDI while significantly outperforming other baseline methods. More importantly, MMDDI maintained excellent performance when simulating cold-start scenarios for new drugs, achieving 98.69% and 93.85% accuracy in S2 and S3, respectively. In contrast to other methods experiencing steep performance declines in inductive settings (e.g., DAS dropping from 99.94% to 75.39% and 50.11%), MMDDI’s stable performance, highlights the advantages of MMDDI’s CL framework. Notably, MMDDI’s performance in S2 even surpassed that in S1. which we hypothesize is attributable to the crucial role of the drug pairs encoder and flip augmentation strategy. The maintained high recall level validates our previous conjecture that additional functional groups may increase the probability of drug interactions. In S3, most baseline metrics dropped to approximately 50% (e.g., PHGL), indicating that methods relying primarily on DDI network construction fail to learn effective representations for novel drugs under strict cold-start conditions. These findings confirm the practical value of the MMDDI framework in providing reliable references for drug safety assessment.

    Table 4 The performance of MMDDI for DDIs classification prediction on Dataset1

    Case study

    Taking Fenfluramine (DB00574) as an example, it is one of the CNS drugs used in treating Dravet Syndrome and Lennox-Gastaut Syndrome, once extensively used anti-obesity drug. We anchored Fenfluramine as inhibitor/inducer and assessed potential risk by randomly selecting 128 non-canonical drugs from Dataset1 with which it constituted administration. The top four drugs with risk factor more than 0.5 were Fulvestrant(DB00947), Disopyramide(DB00280), Efonidipine(DB09235), and Clomipramine(DB01242). We validated the output in Dataset2 and found that Fenfluramine enhances CNS depressant effects when combined with Disopyramide. Further, the independent weights of MMDDI outputs for multi-mechanisms decoupling were 0.521 and 0.479, respectively, suggesting that main mechanism of DDI is by altering activity of relevant enzymes and thus affecting the blood drug concentration. As detailed in interaction checker drugs.com [56], Coadministration with fenfluramine may decrease the plasma concentrations and therapeutic efficacy of drugs that are substrates of the CYP450 2B6 and/or CPY3A4 isoenzymes. Meanwhile, Fenfluramine may elevate blood levels of Clomipramine, leading to adverse effects such as drowsiness, blurred vision, constipation. The independent weights of the multi-mechanism were 0.484 and 0.516, respectively. By inhibiting transporters responsible for clomipramine efflux, fenfluramine may cause clomipramine accumulation, thereby increasing the susceptibility to serotonin syndrome.

    Analysis of Pearson correlations for Dataset 1 (inductive setting S3) reveals key patterns in enzyme-driven DDI mechanisms, as shown in Fig. 8. They-axes correspond to the decoupling weights of different test samples, while thex-axes’ UniPro IDs are associated with specific CYP450 enzymes(e.g., P22261 is related to a cytochrome P450 monooxygenase involved in the metabolism of polyunsaturated fatty acids (PUFA)). A small fraction of enzymes contribute to the majority of DDIs, which leads to numerous vertical light-colored stripes in the figure. The correlation value of 0 results from that the corresponding enzymes are not involved in DDIs. The positive or negative nature of the correlation also partially reflects the enzymes’ inductive or inhibitory properties. Simultaneously, low inter-enzyme correlations in the latent space (Fig. 9) confirm the model’s ability to isolate unique features for different enzymes. Supplementary materials provide further evidence of the independence among the disentangled mechanisms.

    Fig. 8
    figure 8

    Visualization of decoupled representation and protein enzyme correlations

    Fig. 9
    figure 9

    Pearson correlations map of different protein enzyme in latent embedding space

    Conclusions

    In this paper, we devote to tackle data sparsity and mechanism entanglement in co-medication risk identification. As a result, a mechanism-aware co-medication risk evaluator based on CL called MMDDI was proposed. The devised biologically meaningful augmentation strategy overcomes the challenge of limited labeled data, resulting in more robust substructure embeddings. This improvement, validated via ablation studies, is attributable to the multi-view contrastive learning module. Visualization analyses corroborate these findings, showing clear separation and well-defined decision boundaries between positive and negative samples. And the proposed MI constraint decoupling module avoids prediction bias caused by coupling in traditional methods. In the additional DDI prediction task, MMDDI maintains outstanding performance compared to the out of state baselines in inductive setting, achieving accuracy and recall of 0.94 and 0.95, respectively. The case study analyzes the top four drugs in terms of possible risk of co-administration with Fenfluramine, of which both Disopyramide and Clomipramine were validated. The degree of risk aversion of MMDDI for this drug was calculated to be 1.0, 0.63, and 0.5, respectively, and the corresponding independent weights of the MMDDI outputs provide reasonable explanations for the drug’s pathways of action. In addition to MMDDI’s interpretable assistance for co-medication assessment from different perspectives, we believe that it is easily extendable to other tasks such as drug-target interaction risk assessment.

    Our experimental design still needs to be improved. In future work, we will tackle several issues to improve accuracy of MMDDI. Toward learning more structural information about drugs, we will consider excavating from 2D and 3D structure of drug. Besides, testing on broader and larger-scale datasets such as Twosides to enable more comprehensive model validation.

    Data availability

    Two datasets are used in this work.Dataset1[10]: The first data set can be downloaded fromhttps://github.com/ShenggengLin/MDF-SA-DDI/blob/main/event.zip.Dataset2[22]:The second data set is available fromhttps://github.com/ShenggengLin/MDF-SA-DDI/blob/main/Dataset2_drug_information.zip to download. Interaction checkerDrugs.com:https://www.drugs.com/. Chemical databasePubChem:https://pubchem.ncbi.nlm.nih.gov/. The main code can be accessed athttps://github.com/YunjvZeng/MMDDI, we will further organize and refine it later.

    Abbreviations

    DDI:

    Drug-Drug Interactions

    MMDDI:

    Multi-Mechanism Disentangled Drug-drug Interaction assessment framework

    ADR:

    Adverse drug reactions

    NLP:

    Natural language processing

    SMILES:

    Simplified molecular input line entry system

    GAT:

    Graph attention network

    CL:

    Contrastive learning

    ADMET:

    Absorption, distribution, metabolism, excretion, and toxicity

    AE:

    Auto encoder

    MHA:

    Multi head attention

    ACC:

    Accuracy

    HR:

    Hit rate

    NDCG:

    Normalized discounted cumulative gain

    MRR:

    Mean reciprocal rank

    BGD:

    Batch gradient descent

    References

    1. Palleria C, Di Paolo A, Giofrè C, et al. Pharmacokinetic drug-drug interaction and their implication in clinical management. J Res Med Sciences: Official J Isfahan Univ Med Sci. 2013;18:601.

      Google Scholar 

    2. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies[J]. JAMA. 1998;279(15):1200–5.

      Article CAS PubMed  Google Scholar 

    3. Niu D, Xu L, Pan S, et al. SRR-DDI: A drug–drug interaction prediction model with substructure refined representation learning based on self-attention mechanism. Knowl Based Syst. 2024;285:111337.

      Article  Google Scholar 

    4. Tari L, Anwar S, Liang S, et al. Discovering drug–drug interactions: a text-mining and reasoning approach based on properties of drug metabolism[J]. Bioinformatics. 2010;26(18):i547–53.

      Article CAS PubMed PubMed Central  Google Scholar 

    5. Duke JD, Han X, Wang Z et al. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions[J]. 2012.

    6. Qiu Y, Zhang Y, Deng Y, Liu S, Zhang W. A comprehensive review of computational methods for Drug-Drug interaction Detection. IEEE/ACM trans comput biol bioinform. 2022 Jul-Aug;19(4):1968–85.

    7. Huang K, Xiao C, Hoang T, et al. CASTER: predicting drug interactions with chemical substructure representation. Proc AAAI Conf Artif Intell. 2020;34:702–9.

      Google Scholar 

    8. Nyamabo AK, Yu H, Shi J-Y. SSI–DDI: substructure–substructure interactions for drug–drug interaction prediction. Brief Bioinform. 2021;22:bbab133.

      Article PubMed  Google Scholar 

    9. Li Z, Zhu S, Shao B, et al. DSN-DDI: an accurate and generalized framework for drug–drug interaction prediction by dual-view representation learning. Brief Bioinform. 2023;24:bbac597.

      Article PubMed  Google Scholar 

    10. Deng Y, Xu X, Qiu Y, et al. A multimodal deep learning framework for predicting drug–drug interaction events[J]. Bioinformatics. 2020;36(15):4316–22.

      Article CAS PubMed  Google Scholar 

    11. Zhu X, Shen Y, Lu W. Molecular Substructure-Aware network for Drug-Drug interaction prediction. Proc 31st ACM Int Conf Inform Knowl Manage. 2022;4757:4761.

      Google Scholar 

    12. Chen Y, Ma T, Yang X, et al. MUFFIN: multi-scale feature fusion for drug–drug interaction prediction. Bioinformatics. 2021;37:2651–8.

      Article CAS PubMed  Google Scholar 

    13. Lin S, Chen W, Chen G, et al. MDDI-SCL: predicting multi-type drug-drug interactions via supervised contrastive learning. J Cheminform. 2022;14:81.

      Article PubMed PubMed Central  Google Scholar 

    14. Yuan Y, Yue J, Zhang R, et al. PHGL-DDI: A pre-training based hierarchical graph learning framework for drug-drug interaction prediction. Expert Syst Appl. 2025;270:126408.

      Article  Google Scholar 

    15. Wang G, Chen H, Wang H, et al. MMDDI-SSE: A novel Multi-Modal feature fusion model with static subgraph embedding for Drug-Drug interaction event prediction. IEEE J Biomedical Health Inf. 2025;29:6081–91.

      Article PubMed  Google Scholar 

    16. Wang G, Feng H, Cao C. BiRNN-DDI: A Drug-Drug interaction event type prediction model based on bidirectional recurrent neural network and Graph2Seq representation. J Comput Biology: J Comput Mol Cell Biology. 2025;32:198–211.

      Article CAS  Google Scholar 

    17. Niu D, Zhang L, Zhang B, et al. DAS-DDI: A dual-view framework with drug association and drug structure for drug–drug interaction prediction. J Biomed Inform. 2024;156:104672.

      Article PubMed  Google Scholar 

    18. Li Z, Tu X, Chen Y, et al. HetDDI: a pre-trained heterogeneous graph neural network model for drug–drug interaction prediction. Brief Bioinform. 2023;24:bbad385.

      Article PubMed  Google Scholar 

    19. Zhang R, Wang X, Wang P, et al. HTCL-DDI: a hierarchical triple-view contrastive learning framework for drug–drug interaction prediction. Brief Bioinform. 2023;24:bbad324.

      Article PubMed  Google Scholar 

    20. Zheng M, Sun G, Fan Y. MLC-DTA: Drug-target affinity prediction based on multi-level contrastive learning and equivariant graph neural networks. Neurocomputing. 2025;637:130052.

      Article  Google Scholar 

    21. Wang J, Xiao Y, Shang X, et al. Predicting drug–target binding affinity with cross-scale graph contrastive learning. Brief Bioinform. 2023;25:bbad516.

      Article PubMed  Google Scholar 

    22. Lin S, Wang Y, Zhang L, et al. MDF-SA-DDI: predicting drug–drug interaction events based on multi-source drug fusion, multi-source feature fusion and transformer self-attention mechanism[J]. Brief Bioinform. 2022;23(1):bbab421.

      Article PubMed  Google Scholar 

    23. Xiong Z, Liu S, Huang F et al. Multi-relational contrastive learning graph neural network for drug-drug interaction event prediction. Proceedings of the AAAI Conference on Artificial Intelligence. 2023; 37:5339–5347.

    24. Hu W, Zhang W, Zhou Y, et al. MecDDI: clarified drug–Drug interaction mechanism facilitating rational drug use and potential drug–Drug interaction prediction. J Chem Inf Model. 2023;63:1626–36.

      Article CAS PubMed  Google Scholar 

    25. Peng Y, Cheng Z, Xie F. Evaluation of Pharmacokinetic Drug–Drug interactions: A review of the Mechanisms, in vitro and in Silico approaches. Metabolites. 2021;11:75.

      Article CAS PubMed PubMed Central  Google Scholar 

    26. Guan L, Yang H, Cai Y, et al. ADMET-score–a comprehensive scoring function for evaluation of chemical drug-likeness[J]. Medchemcomm. 2019;10(1):148–57.

      Article CAS PubMed  Google Scholar 

    27. Chen T, Kornblith S, Norouzi M, et al. A simple framework for contrastive learning of visual representations. Proc 37th Int Conf Mach Learn. 2020;119:1597–607.

      Google Scholar 

    28. Oord A, Li Y, Vinyals O. Representation learning with contrastive predictive coding[J]. arXiv preprint arXiv:1807.03748, 2018.

    29. Jaiswal A, Babu AR, Zadeh MZ, et al. A survey on contrastive self-supervised learning[J]. Technologies. 2020;9(1):2.

      Article  Google Scholar 

    30. Li X, Sun A, Zhao M et al. Multi-Intention Oriented Contrastive Learning for Sequential Recommendation. Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining. 2023; 411–419.

    31. Liu Z, Chen Y, Li J et al. Contrastive self-supervised sequential recommendation with robust augmentation. ArXiv 2021[J]. arXiv preprint arXiv:2108.06479, 2021.

    32. Xie X, Sun F, Liu Z et al. Contrastive learning for sequential recommendation[C]//2022 IEEE 38th international conference on data engineering (ICDE). IEEE, 2022: 1259–1273.

    33. Zhuang L, Wang H, Zhao J, et al. Adaptive dual graph contrastive learning based on heterogeneous signed network for predicting adverse drug reaction. Inf Sci. 2023;642:119139.

      Article  Google Scholar 

    34. Ma J, Zhou C, Cui P et al. Learning disentangled representations for recommendation[J]. Adv Neural Inf Process Syst, 2019, 32.

    35. Ma J, Zhou C, Yang H et al. Disentangled Self-Supervision in Sequential Recommenders. Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020; 483–491.

    36. Zheng Y, Gao C, Chang J et al. Disentangling long and short-term interests for recommendation[C]//Proceedings of the ACM web conference 2022. 2022: 2256–2267.

    37. Ye X, Li Y, Yao L. DREAM: decoupled representation via extraction attention module and supervised contrastive learning for Cross-Domain sequential recommender. Proc 17th ACM Conf Recommender Syst. 2023;479:490.

      Google Scholar 

    38. Sha X, Sun Z, Zhang J. Disentangling multi-facet social relations for recommendation. IEEE Trans Comput Social Syst. 2021;9:867–78.

      Article  Google Scholar 

    39. Sun Y, Sun Z, Sha X et al. Disentangling Motives behind Item Consumption and Social Connection for Mutually-enhanced Joint Prediction. Proceedings of the 17th ACM Conference on Recommender Systems. 2023; 613–624.

    40. Kamel A, Harriman S. Inhibition of cytochrome P450 enzymes and biochemical aspects of mechanism-based inactivation (MBI). Drug Discovery Today: Technol. 2013;10:e177–89.

      Article  Google Scholar 

    41. Hao B, Yang C, Guo L et al. Motif-based prompt learning for universal cross-domain recommendation[C]//Proceedings of the 17th ACM international conference on web search and data mining. 2024: 257–265.

    42. Landrum G, RDKit. A software suite for cheminformatics, computational chemistry, and predictive modeling[J]. Greg Landrum. 2013;8(3110):5281.

      Google Scholar 

    43. Lee G, Park C, Ahn J. Novel deep learning model for more accurate prediction of drug-drug interaction effects. BMC Bioinformatics. 2019;20:415.

      Article PubMed PubMed Central  Google Scholar 

    44. Chung NC, Miasojedow B, Startek M, et al. Jaccard/Tanimoto similarity test and Estimation methods for biological presence-absence data. BMC Bioinformatics. 2019;20:644.

      Article PubMed PubMed Central  Google Scholar 

    45. Arpit D, Zhou Y, Kota B et al. Normalization propagation: A parametric technique for removing internal covariate shift in deep networks. International Conference on Machine Learning. 2016; 1168–1176.

    46. Gutmann MU, Hyvärinen A. Noise-contrastive Estimation of unnormalized statistical models, with applications to natural image statistics. J Mach Learn Res. 2012;13:307–61.

      Google Scholar 

    47. Yang Z, Zhong W, Lv Q, et al. Learning size-adaptive molecular substructures for explainable drug–drug interaction prediction by substructure-aware graph neural network. Chem Sci. 2022;13:8693–703.

      Article CAS PubMed PubMed Central  Google Scholar 

    48. https://pubchem.ncbi.nlm.nih.gov/

    49. Cao J, Li S, Yu B et al. Towards universal cross-domain recommendation[C]//Proceedings of the Sixteenth ACM International Conference on web search and data mining. 2023: 78–86.

    50. Vilar S, Uriarte E, Santana L, et al. Similarity-based modeling in large-scale prediction of drug-drug interactions. Nat Protoc. 2014;9:2147–63.

      Article CAS PubMed PubMed Central  Google Scholar 

    51. Ren ZH, You ZH, Zou Q et al. DeepMPF: deep learning framework for predicting drug–target interactions based on multi-modal representation with meta-path semantic analysis[J]. Journal of Translational Medicine, 2023, 21(1): 48. Journal of Translational Medicine 2023; 21:48.

    52. Zhang P, Wang F, Hu J, et al. Label propagation prediction of Drug-Drug interactions based on clinical side effects. Sci Rep. 2015;5:12339.

      Article PubMed PubMed Central  Google Scholar 

    53. Gottlieb A, Stein GY, Oron Y, et al. INDI: a computational framework for inferring drug interactions and their associated recommendations. Mol Syst Biol. 2012;8:592.

      Article PubMed PubMed Central  Google Scholar 

    54. Cheng F, Zhao Z. Machine learning-based prediction of drug-drug interactions by integrating drug phenotypic, therapeutic, chemical, and genomic properties. J Am Med Inf Association: JAMIA. 2014;21:e278–286.

      Article  Google Scholar 

    55. Xu N, Wang P, Chen L et al. MR-GNN: Multi-Resolution and Dual Graph Neural Network for Predicting Structured Entity Interactions. Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. 2019; 3968–3974.

    56. https://www.drugs.com/

    Download references

    Acknowledgements

    The authors declare no conflicts of interest. We sincerely thank all anonymous reviewers for valuable suggestions. We also gratefully acknowledge the support provided by the high-performance computing platform of Guangxi University.

    Funding

    This work is supported by grants from the National Science Foundation of China (Grant No. 62362004 and 61962004).

    Author information

    Authors and Affiliations

    1. School of Computer, Electronics and Information, Guangxi University, Nanning, China

      Jinxiong Zhang, Yunjv Zeng, Chunyan Tang, Cheng Zhong, Hao Wen & Yang Liu

    2. Key Laboratory of Parallel Distributed and Intelligent Computing in Guangxi Universities and Colleges, Nanning, China

      Jinxiong Zhang, Chunyan Tang & Cheng Zhong

    Authors
    1. Jinxiong Zhang
    2. Yunjv Zeng
    3. Chunyan Tang
    4. Cheng Zhong
    5. Hao Wen
    6. Yang Liu

    Contributions

    Conceptualization: JX-Z, YJ‐Z; data collection and investigation: YJ‐Z, H-W, Y-L; formal analysis: YJ‐Z; funding acquisition: C-Z; methodology: YJ-Z; software: JX-Z, YJ‐Z; supervision: CY-T, C-Z; visualization: YJ-Z; writing original draft: YJ-Z. All authors read and approved the final manuscript.

    Corresponding authors

    Correspondence toChunyan Tang orCheng Zhong.

    Ethics declarations

    Ethics approval and consent to participate

    Not applicable.

    Consent for publication

    Not applicable.

    Competing interests

    The authors declare no competing interests.

    Additional information

    Publisher’s note

    Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

    Supplementary Information

    Below is the link to the electronic supplementary material.

    Rights and permissions

    Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visithttp://creativecommons.org/licenses/by/4.0/.

    Reprints and permissions

    About this article

    Check for updates. Verify currency and authenticity via CrossMark

    Cite this article

    Zhang, J., Zeng, Y., Tang, C.et al. Contrastive learning-based multi-mechanism disentangled assessment for drug-drug interaction.BMC Bioinformatics26, 286 (2025). https://doi.org/10.1186/s12859-025-06304-z

    Download citation

    Keywords

    Profiles

    1. Chunyan TangView author profile
    Use our pre-submission checklist

    Avoid common mistakes on your manuscript.

    Advertisement


    [8]ページ先頭

    ©2009-2025 Movatter.jp