Disclosure of Invention
In this embodiment, a method, a device and a computer device for generating an antibacterial peptide based on a generated model are provided, so as to solve the problem of low design efficiency of the antibacterial peptide in the related art.
In a first aspect, in this embodiment, there is provided a method for generating an antimicrobial peptide based on a generated model, the method comprising:
acquiring a collection of existing antibacterial peptide sequences;
predicting a novel antibacterial peptide on the basis of the existing antibacterial peptide sequence by utilizing a pre-trained generation model to obtain a predicted sequence;
Inputting the predicted sequence into a pre-trained antibacterial peptide classifier, and judging whether the predicted sequence has antibacterial activity or not, wherein the antibacterial peptide classifier is obtained by training based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set;
outputting the predicted sequence with antibacterial activity as a final target antibacterial peptide sequence.
In some embodiments thereof, obtaining a collection of existing antimicrobial peptide sequences comprises:
extracting initial antimicrobial peptide data from a public database;
And cleaning the initial antibacterial peptide data to obtain a collection of the existing antibacterial peptide sequences.
In some of these embodiments, predicting a novel antimicrobial peptide based on the existing antimicrobial peptide sequence using a pre-trained generative model to obtain a predicted sequence comprises:
expanding the set of the existing antibacterial peptide sequences based on a multi-sequence alignment technology to obtain an antibacterial peptide multi-sequence alignment data set;
Inputting the antimicrobial peptide multi-sequence comparison dataset into a pre-trained generation model, and generating a predicted sequence based on any decoding sequence.
In some of these embodiments, the pre-trained generative model employs a sequential agnostic autoregressive diffusion model.
In some of these embodiments, inputting the predicted sequence into a pre-trained antimicrobial peptide classifier, determining whether the predicted sequence has antimicrobial activity, comprises:
screening the physicochemical properties of the predicted sequence to obtain a screened sequence;
inputting the screened sequence into a pre-trained antibacterial peptide classifier, and outputting a conclusion whether the screened sequence has antibacterial activity.
In some of these embodiments, the method further comprises:
training a classification model based on the obtained antibacterial peptide tag set and the obtained non-antibacterial peptide sequence tag set;
And in the training process of the two-classification model, adjusting parameters of the two-classification model based on a ten-fold cross validation result to obtain the pre-trained antibacterial peptide classifier.
In some of these embodiments, training a classification model based on the obtained set of antibacterial peptide tags and the set of non-antibacterial peptide sequence tags comprises:
Extracting features of the antibacterial peptide tag set and the non-antibacterial peptide sequence tag set to obtain antibacterial peptide features and non-antibacterial peptide features;
Inputting the antibacterial peptide characteristics and the non-antibacterial peptide characteristics into the classification model, and training the classification model.
In a second aspect, in this embodiment, there is provided an antimicrobial peptide generating apparatus based on a generating model, the apparatus comprising:
the existing data acquisition module is used for acquiring a collection of existing antibacterial peptide sequences;
the new sequence prediction module is used for obtaining a predicted sequence similar to the structure of the existing antibacterial peptide sequence by utilizing a pre-trained generation type model;
The activity verification module is used for inputting the predicted sequence into a pre-trained antibacterial peptide classifier and judging whether the predicted sequence has antibacterial activity or not, wherein the antibacterial peptide classifier is obtained by training based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set;
And the target antibacterial peptide output module is used for outputting the predicted sequence with antibacterial activity as a final target antibacterial peptide sequence.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the computer program to realize the antibacterial peptide generation method based on the generation model in the first aspect.
In a fourth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the antimicrobial peptide generation method based on the generative model of the first aspect.
Compared with the related art, the antibacterial peptide generation method, the antibacterial peptide generation device and the computer equipment based on the generation formula model are provided in the embodiment, the problem of low antibacterial peptide design generation efficiency is solved by acquiring the set of the existing antibacterial peptide sequences, utilizing the pre-trained generation formula model to obtain a predicted sequence, inputting the predicted sequence into a pre-trained antibacterial peptide classifier to judge whether the predicted sequence has antibacterial activity, training the antibacterial peptide classifier based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set to obtain the predicted sequence with the antibacterial activity, outputting the predicted sequence with the antibacterial activity as a final target antibacterial peptide sequence, and realizing de-novo generation and judgment of the antibacterial peptide by means of the generation formula model and the classifier, thereby improving the generation efficiency and accuracy.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples for a clearer understanding of the objects, technical solutions and advantages of the present application.
Unless defined otherwise, technical or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," "these" and similar terms in this application are not intended to be limiting in number, but may be singular or plural. The terms "comprises," "comprising," "includes," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, and system, article, or apparatus that comprises a list of steps or modules (units) is not limited to the list of steps or modules (units), but may include other steps or modules (units) not listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in this disclosure are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes the association relationship of the association object, and indicates that three relationships may exist, for example, "a and/or B" may indicate that a exists alone, a and B exist simultaneously, and B exists alone. Typically, the character "/" indicates that the associated object is an "or" relationship. The terms "first," "second," "third," and the like, as referred to in this disclosure, merely distinguish similar objects and do not represent a particular ordering for objects.
The method embodiments provided in the present embodiment may be executed in a terminal, a computer, or similar computing device. For example, the method runs on a terminal, and fig. 1 is a block diagram of the hardware structure of the terminal based on the antimicrobial peptide generating method of the generating model in this embodiment. As shown in fig. 1, the terminal may include one or more (only one is shown in fig. 1) processors 102 and a memory 104 for storing data, wherein the processors 102 may include, but are not limited to, a microprocessor MCU, a programmable logic device FPGA, or the like. The terminal may also include a transmission device 106 for communication functions and an input-output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and is not intended to limit the structure of the terminal. For example, the terminal may also include more or fewer components than shown in fig. 1, or have a different configuration than shown in fig. 1.
The memory 104 may be used to store a computer program, for example, a software program of application software and a module, such as a computer program corresponding to the antimicrobial peptide generating method based on the generation model in the present embodiment, and the processor 102 executes the computer program stored in the memory 104 to perform various functional applications and data processing, that is, to implement the above-described method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission device 106 is used to receive or transmit data via a network. The network includes a wireless network provided by a communication provider of the terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, simply referred to as a NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.
In this embodiment, a method for generating an antimicrobial peptide based on a generated model is provided, fig. 2 is a flowchart of the method for generating an antimicrobial peptide based on a generated model of this embodiment, and as shown in fig. 2, the flowchart includes the following steps:
Step S210, obtaining a collection of existing antibacterial peptide sequences.
Specifically, in practical application, the method of acquiring the existing antibacterial peptide sequence according to the embodiment of the present application includes, but is not limited to, acquiring antibacterial peptide data with antibacterial activity from a pre-stored database or crawling antibacterial peptide data from a literature report of a network platform, and performing operations such as merging, deduplication, filtering and the like on the obtained antibacterial peptide data sequence to obtain a set of the existing antibacterial peptide sequence.
Step S220, predicting a novel antibacterial peptide based on the existing antibacterial peptide sequence by using a pre-trained generation model to obtain a sequence.
Specifically, based on a multi-sequence comparison technology, an existing antibacterial peptide sequence set is expanded to obtain an antibacterial peptide multi-sequence comparison data set, and the antibacterial peptide multi-sequence comparison data set is input into a pre-trained diffusion model to generate a predicted sequence based on any decoding sequence. In other embodiments, step S220 may also be implemented by training a large pre-trained language model (e.g., GPT series) on a corpus that includes existing antibacterial peptide sequence information and some annotation information, such as characteristics, functions, similarities, etc., so that the model can learn the syntax, semantics, and potential features of the antibacterial peptide sequence to obtain a predicted sequence. In other embodiments, candidate antimicrobial peptide sequences may be generated via a Long Short-Term Memory network (LSTM), and the candidate sequences may be input to a transducer for decoding and optimization to obtain an extended antimicrobial peptide sequence, i.e., a predicted sequence.
And step S230, inputting the predicted sequence into a pre-trained antibacterial peptide classifier, and judging whether the predicted sequence has antibacterial activity, wherein the antibacterial peptide classifier is obtained by training based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set.
Specifically, the antibacterial peptide classifier includes, but is not limited to, learning characteristics of the antibacterial peptide by using a model such as a support vector machine, a neural network, a decision tree, and the like. The antibacterial peptide tag set can be obtained based on the existing antibacterial peptide sequence set, and the non-antibacterial peptide sequence tag set comprises various protein sequences which do not belong to antibacterial peptides.
And step S240, outputting the predicted sequence with antibacterial activity as a final target antibacterial peptide sequence.
Specifically, the target antibacterial peptide sequence can be further processed or analyzed, and the target antibacterial peptide sequence can be applied to different scenes according to the processing result or the analysis result. For example, the tumor inhibition performance of the target antibacterial peptide sequence is analyzed to be applied to the medical field, or the bactericidal effect of the target antibacterial peptide sequence is analyzed, the target antibacterial peptide sequence meeting the bactericidal standard is used as a preservative to be applied to the food safety field, and the method can be applied to the fields of scientific research, agriculture and the like.
In the embodiment, the method comprises the steps of acquiring a set of existing antibacterial peptide sequences, obtaining a predicted sequence similar to the existing antibacterial peptide sequences in structure by utilizing a pre-trained generation formula model, inputting the predicted sequence into a pre-trained antibacterial peptide classifier, judging whether the predicted sequence has antibacterial activity, obtaining the antibacterial peptide classifier by training based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set, outputting the predicted sequence with the antibacterial activity as a final target antibacterial peptide sequence, solving the problem of low antibacterial peptide design generation efficiency, and realizing de-novo generation and judgment of the antibacterial peptide by utilizing the generation formula model and the classifier, thereby remarkably improving the antibacterial peptide design efficiency and success rate.
In some embodiments thereof, step S210, obtaining a set of existing antimicrobial peptide sequences, comprises:
Step S211, extracting initial antimicrobial peptide data from the public database.
Step S212, cleaning the initial antibacterial peptide data to obtain a collection of the existing antibacterial peptide sequences.
Specifically, the collection of existing antimicrobial peptide sequences, i.e., the AMP dataset, is assembled based on six common databases, database APD (Antimicrobial Peptide Database), database DADP (Collection of AntimicrobialPeptides), database DBAASP (Database of Antimicrobial ACTIVITY AND Structure of Peptides), database DRAMP (Database of Researchon Antimicrobial Peptides), database YADAMP (Yet Another Database of Antimicrobial Peptides), and database dbAMP (Database ofAntimicrobial Peptides). The data extracted from these databases is combined, deduplicated, and filtered to remove incomplete or nonsensical data entries, improving the quality of the AMP data set (i.e., the collection of existing antimicrobial peptide sequences).
In some embodiments, based on step S220, using the pre-trained generator model, a predicted sequence is obtained that is structurally similar to the existing antimicrobial peptide sequence, comprising:
step S221, expanding the existing antibacterial peptide sequence set based on a multi-sequence comparison technology to obtain an antibacterial peptide multi-sequence comparison data set.
Specifically, multiple sequence alignments (Multiple sequence alignment, MSA) are performed on existing collections of antimicrobial peptide sequences (i.e., AMP datasets) to obtain antimicrobial peptide multiple sequence alignment datasets AMP-MSA with evolution information. In one embodiment, the AMP-positive (i.e., antimicrobial activity) MSA sequence is extracted by a search tool to obtain an antimicrobial peptide multisequence alignment dataset AMP-MSA.
Step S222, inputting the antimicrobial peptide multi-sequence comparison data set into a pre-trained generation model, and generating a predicted sequence based on any decoding sequence.
Specifically, the generation length (for example, 15-35 amino acids) is set on the condition of the antimicrobial peptide multisequence comparison dataset, and a new sequence is predicted by using a generation formula model.
Wherein the pre-trained generative model employs a sequential agnostic autoregressive diffusion model (Order-Agnostic Diffusion Model, OADM). The model is trained on evolutionary multi-sequence alignment (MSA) data for training, allows sequences to be generated in any order, and expands the generation capacity of a traditional autoregressive model.
The OADM diffusion model is trained with log-likelihood of the generated sequence as a loss function, the log-likelihood of the generated sequence being expressed as an expectation of all possible decoding orders:
;
Wherein xσ(t) denotes the amino acid generated at position t in decoding order σ. xσ(<t) denotes all previous amino acids arranged in this order, where lovp (x) denotes the log likelihood value that generated the x sequence, Eσ -U (SL) denotes the expected value for all possible decoding orders, L denotes the length of the sequence, and SL denotes all possible decoding orders.
In training, the model processes the MSA matrix by alternating axial attention mechanisms, reduces row attention computation complexity to O (ML2), reduces column attention complexity to O (LM2), and shares inter-sequence structures in conjunction with binding row attention mechanisms.
The model is pre-trained by Masking Language Modeling (MLM) targets, whose loss function is expressed as:
;
Wherein (m, i) represents a masking position,To correctly predict the probability of a masked amino acid at position (m, i),For the masked MSA sequence, X represents the MSA sequence before masking and θ represents the parameters of the model.
In some of these embodiments, based on S230, inputting the predicted sequence into a pre-trained antimicrobial peptide classifier, determining whether the predicted sequence has antimicrobial activity, comprising:
based on S231, the predicted sequence is subjected to physicochemical property screening to obtain a screened sequence.
Based on S232, the screened sequence is input into a pre-trained antimicrobial peptide classifier, and a conclusion is output as to whether the screened sequence has antimicrobial activity.
Specifically, the screening conditions include isoelectric point, positive charge, hydrophobicity, and the like. Physicochemical screening is performed to evaluate whether the sequence generated by the generator has AMP-like (antimicrobial peptide) properties.
In some of these embodiments, the method further comprises:
Step S250, training a classification model based on the obtained antibacterial peptide tag set and the obtained non-antibacterial peptide sequence tag set.
And step S260, in the training process of the two classification models, adjusting parameters of the two classification models based on a ten-fold cross validation result to obtain the pre-trained antibacterial peptide classifier.
Specifically, ten-fold cross-validation divides all data into ten parts, and then each part is used as a validation set, and the other parts are used as training sets for training and validation. In the process, the super parameters are kept consistent, and then the average training loss and the average verification loss of 10 models are taken to measure the quality of the super parameters. Finally, after a satisfactory super-parameter is obtained, all data are used as a training set, and the super-parameter is used for training to obtain a final model. The cross-validation has the beneficial effects of reducing the contingency and improving the generalization capability of the model. Due to the contingency caused by single division of the training set and the verification set, the existing data set is fully utilized to carry out multiple divisions, so that the condition that contingent hyper-parameters and models without generalization capability are selected due to special divisions is avoided.
In some of these embodiments, based on step S250, training a classification model based on the obtained antibacterial peptide tag set and the non-antibacterial peptide order tag set, comprising:
And step S251, extracting features of the antibacterial peptide tag set and the non-antibacterial peptide sequence tag set to obtain antibacterial peptide features and non-antibacterial peptide features.
Step S252, inputting the antibacterial peptide characteristics and the non-antibacterial peptide characteristics into a classification model, and training the classification model.
Specifically, the feature extraction method includes a Pseudo-K-element reduced amino acid composition (Pseudo K-tuple Reduced Amino Acids Composition, pseKRAAC) encoding method and a quasisequence-Order (QSOrder) encoding method.
After extracting the characteristics, selecting the characteristics, wherein the characteristics are selected by adopting a two-step process. First, features are evaluated according to their Pearson Correlation Coefficient (PCC), calculated as follows:
;
Where yi represents the true target value,Representing predicted values, uy andThe other is the average of the true and predicted values, and N is the total number of samples. This step enables us to rank the features according to their predictive capabilities, thereby quantifying the effectiveness of the features. The most efficient features are then selected and further evaluated using a multi-branch convolutional neural network with Attention mechanism (MBC-Attention) model, resulting in antibacterial and non-antibacterial peptide features for training the classification model.
After the feature engineering process, the feature data is used to train a classification model that uses a gradient-lifting decision tree algorithm (Extreme Gradient Boosting, XGBoost). Wherein the antimicrobial peptide signature is labeled 1 and the non-antimicrobial peptide signature is labeled 0. The XGBoost model optimizes a regularized objective function in order to balance the accuracy and complexity of the model and thereby prevent overfitting. Objective functionThe definition is as follows:
;
where n represents the number of samples during training, l represents the loss function, k represents the index of each tree in the XGBoost integrated model,Is the prediction of xi, and the calculation formula is: Ω (f) is a regularization term: Wherein T represents the number of leaves in the tree, wj represents the weight of the (j) th1 leaf, gamma controls the number of leaves, and lambda controls the L2 norm of the leaf weight.
During the training process, XGBoost models are constructed additively, optimizing the following targets L at each iteration t:
;
Where n represents the number of samples in the training process, l represents the loss function, Ω (ft) represents the regularization term at t, ft(xi) represents the predictions made by the new tree ft on sample xi in the t-th iteration of the training process.
In the case of the objective function L(t),The overall loss of all samples in the dataset after adding the new tree ft to the improvement of the previous round of prediction is calculated. The regularization term Ω (ft) penalizes the complexity of the new tree ft to prevent overfitting.
The second order taylor expansion is used to approximate L:
;
where gi and hi are first and second order gradients, respectively.
The XGBoost model adjustments were made based on the F1 score and AUC index, using 10-fold cross validation (k-fold 10) to prevent overfitting. The F1 score is an index that considers both precision and recall for calculating a balance measure of model accuracy, and in particular when dealing with imbalance categories, it is defined as the harmonic mean of precision and recall:
;
Where F1 Score represents F1 Score, precision represents Precision, recall ratio Recall (also called sensitivity or true positive rate) is the proportion of actual positive cases that the model correctly identified, calculated as follows:
;
where TP represents true positives and FN represents false negatives.
The optimal segmentation for each node during training is determined by maximizing the Gain:
;
Wherein IL and IR represent the segmented left and right child nodes, gi and hi are first and second order gradients, respectively, λ controls the L2 norm of the leaf node weights, preventing overfitting, γ penalizes the segmentation that increases the number of leaf nodes of the tree.
A contraction (learning rate η) is applied to the predictions for each tree to scale them:
;
wherein, theRepresenting samples updated after adding the t-th treeIs used to determine the predicted value of (c),Representing the predicted value of sample xi before adding to the t-th tree, ft(xi) represents the output value of the new tree ft for sample xi in the t-th iteration of the training process. η represents the learning rate, overfitting is prevented by reducing the impact of each tree.
The best performance model determined by cross-validation is used for subsequent analysis.
Fig. 3 shows the performance curves (receiver operating characteristic, ROC) of the XGBoost-based antimicrobial peptide classifier model in ten-fold cross-validation. Each curve represents a folded receiver operating characteristic curve, depicting a tradeoff between true positive rate and false positive rate at different threshold settings. The average ROC curve, indicated by the bold line, summarizes the overall performance of the model, and the area under the curve (AUC) provides a measure of classification accuracy.
The present embodiment is described and illustrated below by way of preferred embodiments.
Fig. 4 is a flowchart of the antibacterial peptide production method based on the production model of the present preferred embodiment. As shown in fig. 4, the antibacterial peptide generation method based on the generation type model provided in the preferred embodiment includes:
S1, constructing an antibacterial peptide data set, namely, a set of the existing antibacterial peptide sequences. And extracting antibacterial peptide data from the antibacterial peptide public database, merging, removing duplication and filtering to delete incomplete or nonsensical data entries, thereby obtaining an antibacterial peptide data set.
S2, constructing an antibacterial peptide multisequence alignment data set. Based on the multi-sequence alignment technology, the existing collection of the antibacterial peptide sequences is expanded to obtain an antibacterial peptide multi-sequence alignment data set.
S3, inputting the antimicrobial peptide multi-sequence comparison data set into a pre-trained sequence-agnostic autoregressive diffusion model, and generating a predicted sequence based on any decoding sequence.
S4, screening the physicochemical properties of the predicted sequence to obtain a screened sequence.
And S5, training a classification model based on the obtained antibacterial peptide tag set and the obtained non-antibacterial peptide sequence tag set, and adjusting parameters of the classification model based on a ten-fold cross verification result in the training process of the classification model to obtain the antibacterial peptide classifier.
S6, inputting the screened sequence into an antibacterial peptide classifier, and judging whether the predicted sequence has antibacterial activity.
S7, outputting the sequence with antibacterial activity as a final target antibacterial peptide sequence.
The preferred embodiment provides a generation and screening strategy, which combines a diffusion model, multi-sequence comparison data and a machine learning optimization tool, so that the design efficiency and success rate of the antibacterial peptide are remarkably improved, and an important innovation in the fields of bioinformatics and protein engineering is realized.
It should be noted that the steps illustrated in the above-described flow or flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order other than that illustrated herein.
In this embodiment, an antibacterial peptide generating device based on a generating model is further provided, and the device is used for implementing the foregoing embodiments and preferred embodiments, and is not described in detail. The terms "module," "unit," "sub-unit," and the like as used below may refer to a combination of software and/or hardware that performs a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementations in hardware, or a combination of software and hardware, are also possible and contemplated.
Fig. 5 is a block diagram showing the structure of an antimicrobial peptide generating apparatus based on a generated model according to the present embodiment, and as shown in fig. 5, the apparatus includes an existing data acquisition module 51, a new sequence prediction module 52, an activity verification module 53, and a target antimicrobial peptide output module 54.
An existing data acquisition module 51 for acquiring a collection of existing antimicrobial peptide sequences;
A new sequence prediction module 52, configured to predict a new antimicrobial peptide based on an existing antimicrobial peptide sequence by using a pre-trained generation model, to obtain a predicted sequence;
the activity verification module 53 is used for inputting the predicted sequence into a pre-trained antibacterial peptide classifier to judge whether the predicted sequence has antibacterial activity, wherein the antibacterial peptide classifier is obtained by training based on an antibacterial peptide tag set and a non-antibacterial peptide sequence tag set;
the target antibacterial peptide output module 54 is configured to output the predicted sequence having antibacterial activity as a final target antibacterial peptide sequence.
In some embodiments, obtaining the collection of existing antimicrobial peptide sequences includes extracting initial antimicrobial peptide data from a public database and washing the initial antimicrobial peptide data to obtain the collection of existing antimicrobial peptide sequences.
In some embodiments, the pre-trained generative model is utilized to obtain a predicted sequence similar to the structure of the existing antibacterial peptide sequence, and the method comprises the steps of expanding the existing antibacterial peptide sequence set based on a multi-sequence comparison technology to obtain an antibacterial peptide multi-sequence comparison data set, inputting the antibacterial peptide multi-sequence comparison data set into the pre-trained generative model, and generating the predicted sequence based on any decoding sequence.
In some of these embodiments, the pre-trained generative model employs a sequential agnostic autoregressive diffusion model.
In some embodiments, inputting the predicted sequence into a pre-trained antimicrobial peptide classifier to determine whether the predicted sequence has antimicrobial activity includes performing a physicochemical screening of the predicted sequence to obtain a screened sequence, inputting the screened sequence into the pre-trained antimicrobial peptide classifier, and outputting a conclusion of whether the screened sequence has antimicrobial activity.
In some embodiments, the method further comprises training the classification model based on the obtained antibacterial peptide tag set and the obtained non-antibacterial peptide sequence tag set, and adjusting parameters of the classification model based on a ten-fold cross-validation result in the training process of the classification model to obtain the pre-trained antibacterial peptide classifier.
In some embodiments, training the classification model based on the obtained antibacterial peptide tag set and the obtained non-antibacterial peptide sequence tag set comprises extracting features of the antibacterial peptide tag set and the non-antibacterial peptide sequence tag set to obtain antibacterial peptide features and non-antibacterial peptide features, inputting the antibacterial peptide features and the non-antibacterial peptide features into the classification model, and training the classification model.
The above-described respective modules may be functional modules or program modules, and may be implemented by software or hardware. For modules implemented in hardware, the modules may be located in the same processor, or may be located in different processors in any combination.
There is also provided in this embodiment a computer device comprising a memory in which a computer program is stored and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the computer device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.
It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and are not described in detail in this embodiment.
In addition, in combination with the antibacterial peptide generation method based on the generation model provided in the above embodiment, a storage medium may be provided in the present embodiment to achieve this. The storage medium has stored thereon a computer program which, when executed by a processor, implements any of the antimicrobial peptide generation methods of the above embodiments based on a generative model.
It should be understood that the specific embodiments described herein are merely illustrative of this application and are not intended to be limiting. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure in accordance with the embodiments provided herein.
It is to be understood that the drawings are merely illustrative of some embodiments of the present application and that it is possible for those skilled in the art to adapt the present application to other similar situations without the need for inventive work. In addition, it should be appreciated that while the development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as a departure from the disclosure.
The term "embodiment" in this disclosure means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive. It will be clear or implicitly understood by those of ordinary skill in the art that the embodiments described in the present application can be combined with other embodiments without conflict.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the patent claims. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.