Disclosure of Invention
The invention aims to solve the problems in the background technology, and provides a rape seed quality evaluation model based on big data.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a big data based canola seed quality assessment model comprising:
the multi-source heterogeneous data acquisition and preprocessing module is used for acquiring multi-source heterogeneous rape data, processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool;
The cross-domain feature interaction map construction module is used for constructing a cross-domain feature interaction map based on a multi-source heterogeneous rape data pool, performing deep feature learning on the constructed cross-domain feature interaction map by using a graph neural network, analyzing a gene-environment interaction relationship and outputting an updated cross-domain feature interaction map;
Analyzing a cross-domain feature interaction map through a preset four-branch multi-head attention framework, generating double weights and fusing features by combining a cross-mode attention mechanism, and generating an optimized comprehensive feature vector fusing multi-dimensional information through double optimization after multi-head attention screening;
the feature evaluation analysis module is used for decoding and analyzing the optimized comprehensive feature vector so as to construct a rape seed quality evaluation model and extract high-yield rape variety data;
And the migration learning adaptation deployment module is used for generating an countermeasure network by utilizing the extracted high-yield rape variety data deployment, adapting the quality evaluation model to a new variety characteristic space through migration learning, and completing intelligent evaluation of the quality of the new rape seeds.
Further, the multi-source heterogeneous rape data acquisition and preprocessing module acquires multi-source heterogeneous rape data, and the process of processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool comprises the following steps:
Extracting rape whole genome SNP mark data from a gene database, collecting field phenotype data through an Internet of things sensor network, accessing a weather station to monitor data flow in real time to obtain field environment parameters, and acquiring rape growth cycle dynamic image data by using an unmanned aerial vehicle carried with a multispectral camera;
The method comprises the steps of uniformly storing all collected data sets into an HBase distributed database and preprocessing, namely removing repeated sites and low-quality sites of rape whole genome SNP marker data through data cleaning, keeping an effective SNP site set to obtain a genotype data set, analyzing time sequence records of field environment parameters, establishing an environment parameter time sequence index, calling unmanned aerial vehicle aerial image data, carrying out feature extraction on the dynamic image data by adopting a computer vision technology, marking image time stamps in segments according to growth periods, extracting vegetation indexes as primary phenotype features, and simultaneously carrying out time-space alignment on the collected field phenotype data and the extracted primary phenotype features, and eliminating sampling time deviation so as to obtain synergistic phenotype features;
Constructing an SNP-chromosome association matrix based on a genotype dataset to serve as a gene dimension basic frame of a rape original data pool; mapping the environmental parameter time sequence index and the collaborative phenotype characteristic to a gene dimension basic framework according to a time stamp, and establishing a cross-domain data association relation; dividing a data column group according to genotype, environment and phenotype by adopting a column group storage model of an HBase distributed database, and setting a compound coding rule of a row key; executing data integrity verification, verifying consistency of cross-domain data through a hash check code, and completing construction of a rape original data pool;
And (5) carrying out standardization treatment on the rape original data pool to obtain a rape standardization data pool.
Further, the cross-domain feature interaction pattern constructing module constructs a cross-domain feature interaction pattern based on the multi-source heterogeneous rape data pool, deep feature learning is performed on the constructed cross-domain feature interaction pattern by using a graph neural network, a gene-environment interaction relationship is analyzed, and the process of outputting the updated cross-domain feature interaction pattern comprises the following steps:
Separating genotype nodes, environment nodes and phenotype nodes from a rape standardized data pool, and performing independent heat coding on each node to generate an initial feature vector serving as an initial feature representation of a map;
After the initial characteristic representation of the node is completed, performing association analysis on the gene node and the environment node, namely calculating a mutual information value of the gene node and the environment node, wherein the mutual information value is used for quantifying the association degree between the two node types;
According to the constructed gene-environment binary edge set and the initial characteristic representation of the nodes, generating a static map, namely, aggregating adjacent node characteristics through a graph convolution layer, enabling the adjacent node characteristics to capture information of adjacent nodes around the nodes, and fusing the captured information into the characteristic representation of a target node so as to generate a node embedded representation;
inputting the generated static map into a three-layer map attention network, and capturing time sequence change of the gene-environment edge weight by using a gating circulating unit;
Defining a gene-environment interaction strength index, wherein the calculation mode of the index is that edge weight is multiplied by node embedding cosine similarity, wherein the edge weight represents the association strength between a gene node and an environment node, and the node embedding cosine similarity is used for measuring the similarity degree of two nodes in a characteristic space;
Introducing a graph contrast learning framework, further analyzing strong correlation edges of N before index values, performing disturbance operation on a static map, wherein the disturbance operation comprises random modification on node characteristics and edge weights, calculating interaction strength change rate by comparing node representation differences of the maps before and after disturbance, setting a change rate threshold M, screening key characteristic edges with the interaction strength change rate greater than or equal to the change rate threshold M, and integrating all key characteristic edges obtained by screening into the static map to form a final dynamic map, namely an updated cross-domain characteristic interaction map.
Further, the process of inputting the generated static map into a three-layer map attention network and capturing the time sequence change of the gene-environment edge weight by using the gating circulation unit comprises the following steps:
each layer of graph attention network adopts a multi-head attention mechanism, and the multi-head attention mechanism learns the attention relation among nodes from a plurality of different subspaces; in each layer, calculating attention coefficients among nodes, wherein the attention coefficients are used for reflecting the importance degree and the association strength among the nodes;
and taking the change rate of the environmental parameters as a gating signal, and dynamically correcting the weight value of the gene-environmental edge by the gating circulation unit according to the change condition of the environmental parameters.
Further, the multi-modal feature fusion and optimization module analyzes the cross-domain feature interaction map through a preset four-branch multi-head attention framework, combines a cross-modal attention mechanism to generate double weights and fusion features, and performs double optimization after multi-head attention screening to generate an optimized comprehensive feature vector fusing multi-dimensional information, wherein the process comprises the following steps:
Inputting the constructed cross-domain characteristic interaction map into a preset four-branch multi-head attention sub-module, wherein the four-branch multi-head attention sub-module is subjected to unit division, and the four-branch multi-head attention sub-module specifically comprises a gene focal point head, an environment focal point head, a phenotype focal point head and an interaction focal point head, and respectively processes gene, environment, phenotype and gene-environment interaction characteristics;
Simultaneously introducing a cross-modal attention mechanism between the gene and the environmental branches, namely carrying out dot product operation on the gene characteristics and the environmental characteristics to calculate the cross-correlation strength between the gene characteristics and the environmental characteristics;
The method comprises the steps of splicing output results of four branches, inputting the spliced feature vectors into a multi-layer perceptron, and further analyzing the input feature vectors by the multi-layer perceptron to finally generate a four-dimensional weight matrix, normalizing the four-dimensional weight matrix through a Softmax function to obtain gene-environment interaction weight and phenotype response weight;
Performing weighted summation operation on genes, environments, phenotypes and gene-environment interaction characteristics based on the gene-environment interaction weight and the phenotype response weight to form an initial fusion vector;
The multi-head self-attention mechanism captures the association relation between elements in the initial fusion vector from a plurality of different subspaces, so as to calculate a global association estimation value; setting a global association threshold K, and screening out a core feature subset with the global association estimated value larger than or equal to the K by comparing the global association estimated value with the threshold K;
Setting a residual connection sub-module, inputting the features of the cross-domain feature interaction map and the core feature subsets obtained by screening into a convolutional neural network to generate residual correction items, performing non-negative matrix factorization operation on the residual correction items, extracting potential factors as constituent units of the optimized comprehensive feature vectors, splicing the extracted potential factors with the core features obtained by screening before, and weighting through attention scores to generate final optimized comprehensive feature vectors.
Further, the feature evaluation analysis module decodes and analyzes the optimized comprehensive feature vector, so as to construct a rape seed quality evaluation model, and the process of extracting high-yield rape variety data comprises the following steps:
The method comprises the steps of realizing feature domain decoupling by adopting a sparse self-encoder, forcing a hidden layer activation value to realize sparsification by L1 regularization constraint, and separating out a gene feature sub-vector, an environment feature sub-vector, a phenotype feature sub-vector and an interaction feature sub-vector, wherein each sub-vector corresponds to an independent feature domain;
constructing a corresponding capsule processing unit for each decoupled feature domain to form a parallelized feature domain processing pipeline, namely determining information transfer weight among sub-vectors through an iterative routing algorithm;
The decoupled feature sub-vectors are weighted and aggregated based on routing weight, and are transmitted layer by layer in a capsule network, wherein the capsules of adjacent feature domains are combined in pairs in a local feature fusion stage and a local interaction mode is refined through dynamic routing;
Setting a multi-scale feature fusion mechanism according to a progressive feature refining processing result, and finally outputting fusion features comprising two layers, namely local feature fusion and global feature fusion, wherein the local feature fusion is to retain interactive detail information through the weighted combination of adjacent feature domain capsules;
carrying out quantitative analysis on local fusion features and global fusion features obtained under a multi-scale feature fusion mechanism by adopting a feature importance evaluation method, and determining the contribution degree of each feature domain and combination thereof to rape variety marking;
And screening out characteristic fields and combinations thereof which have key effects on marking high-yield rape varieties according to quantitative analysis results, taking the characteristic fields and combinations thereof as the standards for subsequent marking and screening, and finally generating marking results of each rape variety.
Further, the migration learning adaptation deployment module deploys and generates an countermeasure network by using the extracted high-yield rape variety data, adapts a quality evaluation model to a new variety feature space through migration learning, and completes the process of intelligently evaluating the quality of the new rape seeds, wherein the process comprises the following steps:
The method comprises the steps of taking extracted high-yield rape variety data as source domain data, inputting the source domain data into a preset countermeasure training sub-module to conduct quality assessment model pretraining so that the rape seed quality related general feature representation is learned, transferring the pretraining model from knowledge on a high-yield rape variety data set to a new variety rape seed quality assessment task by adopting a transfer learning strategy, and simultaneously, carrying out parameter adjustment and optimization on the quality assessment model according to the characteristics of the new variety data.
Compared with the prior art, the method has the advantages that the method integrates data from different channels and different structures by acquiring multi-source heterogeneous rape data and constructing a data pool, breaks data islands, provides comprehensive and rich data basis for subsequent analysis, facilitates excavation of more potential information, can intuitively present interaction relation between genes and environmental characteristics by constructing cross-domain feature interaction maps and utilizing graph neural network learning, provides more accurate basis for subsequent analysis by deeply excavating a gene-environment interaction mechanism, provides powerful support for the output update map, fully pays attention to importance of different dimensional features by combining a multi-branch multi-head attention architecture analysis map and a cross-mode attention mechanism, fuses multidimensional information and doubly optimizes to generate optimized comprehensive feature vectors, effectively improves quality and representativeness of the features, enhances processing capacity of the model on complex data, and can be directly applied to rape seed quality evaluation by constructing an evaluation model by decoding the optimized feature vectors and extracting high-product variety data, simultaneously provides important reference for subsequent new variety research, facilitates rapid screening, facilitates rapid development of varieties and has high-yield evaluation potential by using the intelligent network learning, and is suitable for rapid development of new varieties, and is suitable for rapid evaluation.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a rape seed quality evaluation model based on big data includes:
the multi-source heterogeneous data acquisition and preprocessing module is used for acquiring multi-source heterogeneous rape data, processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool;
The cross-domain feature interaction map construction module is used for constructing a cross-domain feature interaction map based on a multi-source heterogeneous rape data pool, performing deep feature learning on the constructed cross-domain feature interaction map by using a graph neural network, analyzing a gene-environment interaction relationship and outputting an updated cross-domain feature interaction map;
Analyzing a cross-domain feature interaction map through a preset four-branch multi-head attention framework, generating double weights and fusing features by combining a cross-mode attention mechanism, and generating an optimized comprehensive feature vector fusing multi-dimensional information through double optimization after multi-head attention screening;
the feature evaluation analysis module is used for decoding and analyzing the optimized comprehensive feature vector so as to construct a rape seed quality evaluation model and extract high-yield rape variety data;
And the migration learning adaptation deployment module is used for generating an countermeasure network by utilizing the extracted high-yield rape variety data deployment, adapting the quality evaluation model to a new variety characteristic space through migration learning, and completing intelligent evaluation of the quality of the new rape seeds.
It should be further described that, in the specific implementation process, the multi-source heterogeneous data acquisition and preprocessing module acquires multi-source heterogeneous rape data, and the process of processing and analyzing the acquired multi-source heterogeneous rape data to generate the multi-source heterogeneous rape data pool is as follows:
Extracting rape whole genome SNP mark data from a gene database, acquiring field phenotype data through an Internet of things sensor network, accessing a weather station to monitor data flow in real time to acquire field environment parameters, and acquiring rape growth cycle dynamic image data by using an unmanned aerial vehicle carried with a multispectral camera, wherein the phenotype data acquired by the field sensor comprises branch numbers, yield and the like, and the field environment parameters comprise environment monitoring data such as soil humidity, temperature, illumination intensity and the like;
The method comprises the steps of uniformly storing all collected data sets into an HBase distributed database and preprocessing, namely removing repeated sites and low-quality sites of rape whole genome SNP marker data through data cleaning, keeping an effective SNP site set to obtain a genotype data set, analyzing time sequence records of field environment parameters, establishing an environment parameter time sequence index, calling unmanned aerial vehicle aerial image data, carrying out feature extraction on the dynamic image data by adopting a computer vision technology, marking image time stamps in segments according to growth periods, extracting vegetation indexes as primary phenotype features, and simultaneously carrying out time-space alignment on the collected field phenotype data and the extracted primary phenotype features, and eliminating sampling time deviation so as to obtain synergistic phenotype features;
Constructing an SNP-chromosome association matrix based on a genotype dataset to serve as a gene dimension basic frame of a rape original data pool; mapping the environmental parameter time sequence index and the collaborative phenotype characteristic to a gene dimension basic framework according to a time stamp, and establishing a cross-domain data association relation; dividing a data column group according to genotype, environment and phenotype by adopting a column group storage model of an HBase distributed database, and setting a compound coding rule of a row key; executing data integrity verification, verifying consistency of cross-domain data through a hash check code, and completing construction of a rape original data pool;
The method comprises the steps of carrying out standardization processing on rape original data pools to obtain rape standardization data pools, specifically, carrying out clustering analysis on the rape original data pools, adopting a DBSCAN algorithm to identify data outliers caused by extreme weather, eliminating abnormal rape data samples based on local density threshold values, filling the missing measurement values by adopting a time sequence interpolation algorithm according to the missing measurement values, carrying out Z-score standardization processing on the filled data, eliminating dimension differences, and generating the standardization data pools conforming to normal distribution.
It should be further described that, in the specific implementation process, the cross-domain feature interaction map construction module constructs a cross-domain feature interaction map based on the multi-source heterogeneous rape data pool, and uses the graph neural network to perform deep feature learning on the constructed cross-domain feature interaction map, analyze the gene-environment interaction relationship, and output the updated cross-domain feature interaction map as follows:
The method comprises the steps of separating genotype nodes (SNP locus sequences), environment nodes (time sequence data such as soil humidity and temperature), phenotype nodes (NDVI, plant height and the like) from a rape standardized data pool, performing independent thermal coding on each node to generate initial characteristic vectors serving as map initial characteristic representations, acquiring genotype, environment and phenotype data of 1000 samples from the rape standardized data pool, wherein the genotype data comprises 50 locus information, the environment data comprises 10 environment parameters such as temperature, humidity and illumination, the phenotype data comprises 8 phenotype indexes such as plant height and yield, the genotype nodes have 50 possible states and are coded into 50-dimensional vectors, and the environment nodes comprise 10 dimensions and 8 dimensions of the phenotype nodes to generate initial characteristic vectors;
after the initial characteristic representation of the node is completed, performing association analysis on the gene node and the environment node, namely calculating the mutual information value of the gene node and the environment nodeWherein the mutual information value is used to quantify the degree of association between the two node types, the formula is:
;
In the formula,In order to be a gene node,Is a set of states of the nodes of the gene,As a node of the environment,For a set of states of an environmental node,In order to combine the probabilities of the probability,Setting a mutual information threshold (such as 0.5) based on the calculated mutual information value (assumed to be 0.3) for screening out the node pairs with obvious association;
According to the constructed gene-environment binary edge set and the initial characteristic representation of the node, generating a static map, namely aggregating adjacent node characteristics through a graph convolution layer, enabling the adjacent node characteristics to capture information of adjacent nodes around the node, and fusing the captured information into the characteristic representation of the target node, so as to generate a node embedded representation; it can be understood that when constructing a static map, using a graph convolution layer to aggregate adjacent node characteristics, wherein the process involves information interaction between a plurality of nodes in a graph structure, selecting a node as a currently processed object in each step of graph convolution, wherein the node is a target node, for example, a gene-environment interaction map is assumed to comprise gene nodes j1, j2, j3 and environment nodes e1 and e2, when the gene node j1 is subjected to graph convolution operation, the gene node j1 is the target node, if the gene node j1 is connected with the environment nodes e1 and e2, the graph convolution layer captures characteristic information of the environment nodes e1 and e2 firstly, then fuses the characteristic information into characteristic representation of the gene node j1, the updated characteristic of the gene node j1 comprises initial characteristics of the node and characteristic information of the environment nodes e1 and e2, and further describes interaction conditions between the gene node j1 and the environment more accurately;
inputting the generated static map into a three-layer map attention network, and capturing time sequence changes of gene-environment edge weights by using a gating circulation unit, wherein each layer of map attention network adopts a multi-head attention mechanism (such as an 8-head attention mechanism), the multi-head attention mechanism learns attention relations among nodes from a plurality of different subspaces so as to more comprehensively capture interaction information among the nodes, in each layer, calculates attention coefficients among the nodes, wherein the attention coefficients are used for reflecting importance degrees and association strength among the nodes, updates the representation of the nodes according to the calculated attention coefficients so that the nodes fuse information of adjacent nodes, dynamically corrects the weight values of the gene-environment edge by using the environmental parameter change rate as a gating signal, and dynamically describes the dynamic interaction relation between the gene and the environment according to the change condition of the environmental parameters by using the gating circulation unit so that the edge weights in the map are updated in real time along with the change of the environment;
definition of Gene-Environment interaction Strength indexThe index is calculated by the edge weightMultiplying node embedded cosine similarityI.e.Based on the gene-environment interaction intensity index, screening strong association edges of N (such as 50) before an index value, wherein N is a preset positive integer, and the method is used for limiting the number of the strong association edges selected from a plurality of gene-environment edges after being sequenced according to the interaction intensity index value so as to focus the edges playing a key role in gene-environment interaction and lay a number range foundation for the follow-up accurate screening of key feature edges;
Introducing a graph contrast learning framework, further analyzing the strong correlation edges of N before the index value, performing disturbance operation on the static map, wherein the disturbance operation comprises the random modification of node characteristics and edge weights, and calculating the interaction strength change rate by comparing node representation differences of the maps before and after disturbance :
;
Setting a change rate threshold M, screening key feature edges with the change rate of interaction strength being greater than or equal to the change rate threshold M, integrating all the key feature edges obtained by screening into a static map to form a final dynamic map, namely an updated cross-domain feature interaction map;
For example, the temperature is changed by 10% and the humidity is changed by 5% within one month, the weight values of the gene-environment edges are dynamically corrected according to the change rates, the gene-environment interaction intensity indexes are defined, the strong correlation edges of the front 50 of the index values are screened, the static map is subjected to disturbance operation, such as randomly modifying 5% of node characteristics and 3% of edge weights, the interaction intensity change rate is calculated by comparing the node representation differences of the maps before and after disturbance, the change rate threshold M is set to be 0.1, 30 key characteristic edges with the change rate greater than or equal to 0.1 are screened out, and the 30 key characteristic edges are integrated into the static map to form the final dynamic map.
It should be further described that, in the specific implementation process, the multi-modal feature fusion and optimization module analyzes the cross-domain feature interaction map through a preset four-branch multi-head attention architecture, combines the cross-modal attention mechanism to generate dual weights and fusion features, and the process of generating the optimized comprehensive feature vector fusing multi-dimensional information through dual optimization after multi-head attention screening is as follows:
Inputting the constructed cross-domain characteristic interaction map into a preset four-branch multi-head attention sub-module, wherein the four-branch multi-head attention sub-module is subjected to unit division, and the four-branch multi-head attention sub-module specifically comprises a gene focal point head, an environment focal point head, a phenotype focal point head and an interaction focal point head, and respectively processes gene, environment, phenotype and gene-environment interaction characteristics;
After the unit division is completed, the interior of each focus head starts to independently perform internal calculation, including calculation inquiry, keys and value matrix, and simultaneously, a cross-modal attention mechanism is introduced between the gene and the environmental branches, namely, the cross-modal attention mechanism is used for performing dot product operation on the gene characteristics and the environmental characteristics to calculate the cross-correlation strength between the gene characteristics and the environmental characteristics;
Splicing the output results of four branches (a gene focus head, an environment focus head, a phenotype focus head and an interaction focus head), wherein the spliced feature vector comprises comprehensive information of genes, environments, phenotypes and gene-environment interactions, inputting the spliced feature vector into a multi-layer perceptron, further analyzing the input feature vector by the multi-layer perceptron (nonlinear transformation and feature extraction) to finally generate a four-dimensional weight matrix, and normalizing the four-dimensional weight matrix through a Softmax function to obtain gene-environment interaction weight and phenotype response weight, wherein the gene-environment interaction weight is used for reflecting the importance degree of the interaction between the genes and the environments, and the phenotype response weight reflects the sensitivity degree of the phenotype to the changes of the genes and the environments;
Performing weighted summation operation on genes, environments, phenotypes and gene-environment interaction characteristics based on the gene-environment interaction weight and the phenotype response weight to form an initial fusion vector;
The multi-head self-attention mechanism captures the association relation between elements in the initial fusion vector from a plurality of different subspaces, so as to calculate a global association estimation value, wherein the global association estimation value reflects the importance degree of each element in the vector in the whole vector; setting a global association threshold K (e.g. 0.8), and screening out a core feature subset with the global association estimated value greater than or equal to K by comparing the global association estimated value with the threshold K, wherein the core feature subset contains the most critical and important information in the initial fusion vector;
The method comprises the steps of setting a residual connection submodule, inputting characteristics of a cross-domain characteristic interaction map and a core characteristic subset obtained through screening into a convolutional neural network to generate a residual correction term, wherein the characteristics of the cross-domain characteristic interaction map comprise characteristic information of genes, environments, phenotypes and gene-environment interactions, performing non-negative matrix factorization operation on the residual correction term, namely decomposing the high-dimensional residual correction term into a low-dimensional potential factor matrix and a coefficient matrix, taking potential factors as potential characteristic modes in the residual correction term, understandably, on the gene characteristic dimension, the potential factors correspond to core components of gene expression modes in a gene regulation network, reflecting potential driving factors of key environment factor combinations or environment changes influencing rape traits, representing potential sources of phenotype variations or phenotype classification in the aspect of phenotype characteristics, reflecting potential mechanisms of the gene-environment interactions for the gene-environment interactions, extracting the potential factors as constituent units of optimizing comprehensive characteristic vectors, splicing the extracted potential factors with the previously screened core characteristics, and weighting through attention to generate final optimized comprehensive characteristic vectors.
It should be further described that, in the specific implementation process, the feature evaluation analysis module decodes and analyzes the optimized comprehensive feature vector, so as to construct a rape seed quality evaluation model, and the process of extracting high-yield rape variety data is as follows:
Decoupling the feature domain by adopting a sparse self-encoder, forcing the activation value of the hidden layer to realize sparsification by L1 regularization constraint, and separating out the gene feature sub-vectorEnvironmental feature sub-vectorPhenotypic feature sub-vectorInteraction feature sub-vectorWherein, R represents real space (i.e. the eigenvalue is composed of specific numerical values), d represents dimension of the corresponding eigenvector (i.e. parameter quantity describing gene, environment, phenotype or interaction characteristics), each subvector corresponds to an independent eigenvector, and represents a certain kind of semantically clear eigenvector in the original eigenvector;
Constructing a corresponding capsule processing unit for each decoupled characteristic domain (gene/environment/phenotype/interaction), forming a parallelized characteristic domain processing pipeline, namely determining information transfer weight among sub-vectors through an iterative routing algorithm, constructing a corresponding processing pipeline for each characteristic domain, setting initial routing coefficients, adjusting the routing coefficients according to the prediction consistency of a lower-layer capsule and a higher-layer capsule, constructing a corresponding processing pipeline for each characteristic domain, particularly, acquiring a gene pipeline by initializing a 1D-CNN convolution kernel, acquiring an environment pipeline by loading a pre-trained ResNet-18 weight, loading plant phenotype group priori knowledge by constructing a knowledge distillation sub-module, acquiring a phenotype pipeline by initializing a bilinear fusion matrixAcquiring a gene-environment interaction pipeline;
The method comprises the steps of carrying out weighted aggregation on the decoupled feature sub-vectors based on routing weights, carrying out layer-by-layer transmission in a capsule network, firstly, carrying out pairwise combination on capsules of adjacent feature domains (such as gene-environment and environment-phenotype) in a local feature fusion stage, refining a local interaction mode through dynamic routing, and then integrating all the feature domain capsules into a unified high-level representation in a global feature fusion stage to form a progressive feature refining path from local to global, namely, inputting gene feature sub-vectors G to 1D-CNN for gene feature refining, and outputting a primary gene modeSequence labeling is carried out through a CRF layer, a disease-resistant related SNP combination mode is identified, and a gene quality score is generatedWhereinIn order to activate the function,Is a weight matrix of the gene signature,For refining environmental characteristics, E is input into space-time transducer, and decomposed into static property Es (soil type, etc.) and dynamic time series Ed (precipitation/temperature sequence) by separable attention mechanism to generate environmental adaptability scoreME is a multi-layer decision processor (mining the rule of static environment characteristics through a multi-layer neural network), LE is a time sequence memory sub-module (processing environment dynamic data of temperature/precipitation and the like changing along with time), LE is a characteristic fusion operation (combining and analyzing static attribute and dynamic time sequence information), phenotype characteristics are refined, a cross-layer graph structure is constructed, nodes are phenotype characteristics, genetic relativity among phenotypes is achieved, information is transmitted through a GAT network, phenotype representation P' is updated, and a phenotype integrity score is generatedWhereinIn order for the attention to be weighted,Representing different phenotypic characteristic dimensions (such as specific character indexes of plant height, fruit number, thousand grain weight and the like), and performing bilinear pooling on interaction characteristic refining: Generating an interaction strength scoreWhere tan h is the bi-directional activation function,The transpose of the matrix is represented,For the interaction feature weight matrix,The reference correction parameters calculated for the interaction features (i.e. the interaction feature bias term),For the interaction characteristic matrix (comprising multidimensional interaction information of genes and environment) after bilinear pooling, F is a norm in the interaction intensity score;
setting a multi-scale feature fusion mechanism according to a progressive feature refining processing result, and finally outputting fusion features comprising two layers, namely local feature fusion and global feature fusion, wherein the local feature fusion is to retain interactive detail information through a weighted combination of adjacent feature domain capsules (such as gene-environment fusion capsules);
Carrying out quantitative analysis on local fusion features and global fusion features obtained under a multi-scale feature fusion mechanism by adopting a feature importance assessment method (such as feature importance calculation based on a tree model), and determining the contribution degree of each feature domain and combination thereof to rape variety marking;
According to the quantitative analysis result, screening out the characteristic fields with key effects on the high-yield rape variety labeling and the combinations thereof (such as certain gene locus combinations in gene characteristic subvectors, temperature and humidity combinations in environment characteristic subvectors and the like) as the standards of subsequent labeling and screening, and simultaneously eliminating the characteristics with smaller contribution or irrelevant characteristics, for example, extracting the high-yield rape variety data conforming to the key characteristics from the data to obtain 200 pieces of high-yield rape variety data, and finally generating the labeling result of each rape variety.
It should be further noted that, in the specific implementation process, the migration learning adaptation deployment module deploys and generates an countermeasure network by using the extracted high-yield rape variety data, and adapts the quality evaluation model to the new variety feature space through migration learning, so that the process of intelligently evaluating the quality of the new rape seeds is as follows:
The method comprises the steps of taking extracted high-yield rape variety data as source domain data, inputting the source domain data into a preset countermeasure training sub-module to conduct quality assessment model pretraining so that the rape seed quality related general feature representation is learned, transferring the pretraining model from knowledge on a high-yield rape variety data set to a new variety rape seed quality assessment task by adopting a transfer learning strategy, and simultaneously, carrying out parameter adjustment and optimization on the quality assessment model according to characteristics of the new variety data to ensure that the quality assessment model can accurately assess rape seed quality on the new variety data.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally, the foregoing description of the preferred embodiment of the invention is provided for the purpose of illustration only, and is not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.