Movatterモバイル変換


[0]ホーム

URL:


CN120196911A - A rapeseed seed quality evaluation model based on big data - Google Patents

A rapeseed seed quality evaluation model based on big data
Download PDF

Info

Publication number
CN120196911A
CN120196911ACN202510678901.4ACN202510678901ACN120196911ACN 120196911 ACN120196911 ACN 120196911ACN 202510678901 ACN202510678901 ACN 202510678901ACN 120196911 ACN120196911 ACN 120196911A
Authority
CN
China
Prior art keywords
feature
data
rape
gene
interaction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510678901.4A
Other languages
Chinese (zh)
Other versions
CN120196911B (en
Inventor
雷波
李楠
孙奥
任鸿昌
黄秋婉
唐梦诗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Maiqugeng Technology Co ltd
Original Assignee
Beijing Maiqugeng Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Maiqugeng Technology Co ltdfiledCriticalBeijing Maiqugeng Technology Co ltd
Priority to CN202510678901.4ApriorityCriticalpatent/CN120196911B/en
Publication of CN120196911ApublicationCriticalpatent/CN120196911A/en
Application grantedgrantedCritical
Publication of CN120196911BpublicationCriticalpatent/CN120196911B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明涉及种子处理分析技术领域,尤其涉及一种基于大数据的油菜种子品质评估模型,包括通过汇聚多源异构油菜数据构建数据池,为后续跨域特征关联提供高质量输入;通过构建跨域特征交互图谱,并运用图神经网络进行学习,直观呈现跨域特征交互,分析基因‑环境互作关系;通过四分支多头与跨模态注意力解析图谱,聚焦关键特征、融合优化信息,生成优质特征向量,实现高阶特征非线性融合,显著提升模型对多源异构数据的表征能力;通过解码向量构建评估模型并提取高产品种数据,有利于高效筛选油菜品种;通过部署生成对抗网络并迁移学习适配模型,快速拓展评估能力至新品种,推动育种智能化。

The present invention relates to the technical field of seed processing and analysis, and in particular to a rapeseed quality evaluation model based on big data, comprising: building a data pool by aggregating multi-source heterogeneous rapeseed data to provide high-quality input for subsequent cross-domain feature association; constructing a cross-domain feature interaction map and using a graph neural network for learning to intuitively present cross-domain feature interactions and analyze gene-environment interaction relationships; focusing on key features and integrating optimization information through a four-branch multi-head and cross-modal attention parsing map to generate high-quality feature vectors, realize nonlinear fusion of high-order features, and significantly improve the model's representation ability for multi-source heterogeneous data; building an evaluation model through decoding vectors and extracting high-yield variety data, which is conducive to efficient screening of rapeseed varieties; and rapidly expanding the evaluation capability to new varieties and promoting intelligent breeding by deploying a generative adversarial network and transferring learning adaptation models.

Description

Rape seed quality evaluation model based on big data
Technical Field
The invention relates to the technical field of seed treatment analysis, in particular to a rape seed quality assessment model based on big data.
Background
The current rape seed quality assessment still faces some challenges including the following aspects that rape seed quality assessment relates to multi-source heterogeneous data such as genes, environments, phenotypes and the like, the data sources are wide and various in format, effective integration means are lacked, the data utilization efficiency is low, the data value is difficult to fully develop, the rape seed quality is influenced by the genes and the environments together, the gene-environment interaction relationship is complex, the traditional method is difficult to deeply mine the complex interaction, the understanding of a rape seed quality formation mechanism is not deep enough, when evaluating the rape seed quality, multidimensional characteristics such as genes, environments, phenotypes and the like are needed to be considered, but the traditional method is insufficient in characteristic fusion, the information of different dimensional characteristics is difficult to fully integrate, the assessment result is inaccurate, and along with continuous updating of rape varieties, the traditional quality assessment model is difficult to quickly adapt to the characteristic space of new varieties, and the quality assessment of the new varieties is inaccurate. Therefore, the invention provides a rape seed quality evaluation model based on big data.
Disclosure of Invention
The invention aims to solve the problems in the background technology, and provides a rape seed quality evaluation model based on big data.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a big data based canola seed quality assessment model comprising:
the multi-source heterogeneous data acquisition and preprocessing module is used for acquiring multi-source heterogeneous rape data, processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool;
The cross-domain feature interaction map construction module is used for constructing a cross-domain feature interaction map based on a multi-source heterogeneous rape data pool, performing deep feature learning on the constructed cross-domain feature interaction map by using a graph neural network, analyzing a gene-environment interaction relationship and outputting an updated cross-domain feature interaction map;
Analyzing a cross-domain feature interaction map through a preset four-branch multi-head attention framework, generating double weights and fusing features by combining a cross-mode attention mechanism, and generating an optimized comprehensive feature vector fusing multi-dimensional information through double optimization after multi-head attention screening;
the feature evaluation analysis module is used for decoding and analyzing the optimized comprehensive feature vector so as to construct a rape seed quality evaluation model and extract high-yield rape variety data;
And the migration learning adaptation deployment module is used for generating an countermeasure network by utilizing the extracted high-yield rape variety data deployment, adapting the quality evaluation model to a new variety characteristic space through migration learning, and completing intelligent evaluation of the quality of the new rape seeds.
Further, the multi-source heterogeneous rape data acquisition and preprocessing module acquires multi-source heterogeneous rape data, and the process of processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool comprises the following steps:
Extracting rape whole genome SNP mark data from a gene database, collecting field phenotype data through an Internet of things sensor network, accessing a weather station to monitor data flow in real time to obtain field environment parameters, and acquiring rape growth cycle dynamic image data by using an unmanned aerial vehicle carried with a multispectral camera;
The method comprises the steps of uniformly storing all collected data sets into an HBase distributed database and preprocessing, namely removing repeated sites and low-quality sites of rape whole genome SNP marker data through data cleaning, keeping an effective SNP site set to obtain a genotype data set, analyzing time sequence records of field environment parameters, establishing an environment parameter time sequence index, calling unmanned aerial vehicle aerial image data, carrying out feature extraction on the dynamic image data by adopting a computer vision technology, marking image time stamps in segments according to growth periods, extracting vegetation indexes as primary phenotype features, and simultaneously carrying out time-space alignment on the collected field phenotype data and the extracted primary phenotype features, and eliminating sampling time deviation so as to obtain synergistic phenotype features;
Constructing an SNP-chromosome association matrix based on a genotype dataset to serve as a gene dimension basic frame of a rape original data pool; mapping the environmental parameter time sequence index and the collaborative phenotype characteristic to a gene dimension basic framework according to a time stamp, and establishing a cross-domain data association relation; dividing a data column group according to genotype, environment and phenotype by adopting a column group storage model of an HBase distributed database, and setting a compound coding rule of a row key; executing data integrity verification, verifying consistency of cross-domain data through a hash check code, and completing construction of a rape original data pool;
And (5) carrying out standardization treatment on the rape original data pool to obtain a rape standardization data pool.
Further, the cross-domain feature interaction pattern constructing module constructs a cross-domain feature interaction pattern based on the multi-source heterogeneous rape data pool, deep feature learning is performed on the constructed cross-domain feature interaction pattern by using a graph neural network, a gene-environment interaction relationship is analyzed, and the process of outputting the updated cross-domain feature interaction pattern comprises the following steps:
Separating genotype nodes, environment nodes and phenotype nodes from a rape standardized data pool, and performing independent heat coding on each node to generate an initial feature vector serving as an initial feature representation of a map;
After the initial characteristic representation of the node is completed, performing association analysis on the gene node and the environment node, namely calculating a mutual information value of the gene node and the environment node, wherein the mutual information value is used for quantifying the association degree between the two node types;
According to the constructed gene-environment binary edge set and the initial characteristic representation of the nodes, generating a static map, namely, aggregating adjacent node characteristics through a graph convolution layer, enabling the adjacent node characteristics to capture information of adjacent nodes around the nodes, and fusing the captured information into the characteristic representation of a target node so as to generate a node embedded representation;
inputting the generated static map into a three-layer map attention network, and capturing time sequence change of the gene-environment edge weight by using a gating circulating unit;
Defining a gene-environment interaction strength index, wherein the calculation mode of the index is that edge weight is multiplied by node embedding cosine similarity, wherein the edge weight represents the association strength between a gene node and an environment node, and the node embedding cosine similarity is used for measuring the similarity degree of two nodes in a characteristic space;
Introducing a graph contrast learning framework, further analyzing strong correlation edges of N before index values, performing disturbance operation on a static map, wherein the disturbance operation comprises random modification on node characteristics and edge weights, calculating interaction strength change rate by comparing node representation differences of the maps before and after disturbance, setting a change rate threshold M, screening key characteristic edges with the interaction strength change rate greater than or equal to the change rate threshold M, and integrating all key characteristic edges obtained by screening into the static map to form a final dynamic map, namely an updated cross-domain characteristic interaction map.
Further, the process of inputting the generated static map into a three-layer map attention network and capturing the time sequence change of the gene-environment edge weight by using the gating circulation unit comprises the following steps:
each layer of graph attention network adopts a multi-head attention mechanism, and the multi-head attention mechanism learns the attention relation among nodes from a plurality of different subspaces; in each layer, calculating attention coefficients among nodes, wherein the attention coefficients are used for reflecting the importance degree and the association strength among the nodes;
and taking the change rate of the environmental parameters as a gating signal, and dynamically correcting the weight value of the gene-environmental edge by the gating circulation unit according to the change condition of the environmental parameters.
Further, the multi-modal feature fusion and optimization module analyzes the cross-domain feature interaction map through a preset four-branch multi-head attention framework, combines a cross-modal attention mechanism to generate double weights and fusion features, and performs double optimization after multi-head attention screening to generate an optimized comprehensive feature vector fusing multi-dimensional information, wherein the process comprises the following steps:
Inputting the constructed cross-domain characteristic interaction map into a preset four-branch multi-head attention sub-module, wherein the four-branch multi-head attention sub-module is subjected to unit division, and the four-branch multi-head attention sub-module specifically comprises a gene focal point head, an environment focal point head, a phenotype focal point head and an interaction focal point head, and respectively processes gene, environment, phenotype and gene-environment interaction characteristics;
Simultaneously introducing a cross-modal attention mechanism between the gene and the environmental branches, namely carrying out dot product operation on the gene characteristics and the environmental characteristics to calculate the cross-correlation strength between the gene characteristics and the environmental characteristics;
The method comprises the steps of splicing output results of four branches, inputting the spliced feature vectors into a multi-layer perceptron, and further analyzing the input feature vectors by the multi-layer perceptron to finally generate a four-dimensional weight matrix, normalizing the four-dimensional weight matrix through a Softmax function to obtain gene-environment interaction weight and phenotype response weight;
Performing weighted summation operation on genes, environments, phenotypes and gene-environment interaction characteristics based on the gene-environment interaction weight and the phenotype response weight to form an initial fusion vector;
The multi-head self-attention mechanism captures the association relation between elements in the initial fusion vector from a plurality of different subspaces, so as to calculate a global association estimation value; setting a global association threshold K, and screening out a core feature subset with the global association estimated value larger than or equal to the K by comparing the global association estimated value with the threshold K;
Setting a residual connection sub-module, inputting the features of the cross-domain feature interaction map and the core feature subsets obtained by screening into a convolutional neural network to generate residual correction items, performing non-negative matrix factorization operation on the residual correction items, extracting potential factors as constituent units of the optimized comprehensive feature vectors, splicing the extracted potential factors with the core features obtained by screening before, and weighting through attention scores to generate final optimized comprehensive feature vectors.
Further, the feature evaluation analysis module decodes and analyzes the optimized comprehensive feature vector, so as to construct a rape seed quality evaluation model, and the process of extracting high-yield rape variety data comprises the following steps:
The method comprises the steps of realizing feature domain decoupling by adopting a sparse self-encoder, forcing a hidden layer activation value to realize sparsification by L1 regularization constraint, and separating out a gene feature sub-vector, an environment feature sub-vector, a phenotype feature sub-vector and an interaction feature sub-vector, wherein each sub-vector corresponds to an independent feature domain;
constructing a corresponding capsule processing unit for each decoupled feature domain to form a parallelized feature domain processing pipeline, namely determining information transfer weight among sub-vectors through an iterative routing algorithm;
The decoupled feature sub-vectors are weighted and aggregated based on routing weight, and are transmitted layer by layer in a capsule network, wherein the capsules of adjacent feature domains are combined in pairs in a local feature fusion stage and a local interaction mode is refined through dynamic routing;
Setting a multi-scale feature fusion mechanism according to a progressive feature refining processing result, and finally outputting fusion features comprising two layers, namely local feature fusion and global feature fusion, wherein the local feature fusion is to retain interactive detail information through the weighted combination of adjacent feature domain capsules;
carrying out quantitative analysis on local fusion features and global fusion features obtained under a multi-scale feature fusion mechanism by adopting a feature importance evaluation method, and determining the contribution degree of each feature domain and combination thereof to rape variety marking;
And screening out characteristic fields and combinations thereof which have key effects on marking high-yield rape varieties according to quantitative analysis results, taking the characteristic fields and combinations thereof as the standards for subsequent marking and screening, and finally generating marking results of each rape variety.
Further, the migration learning adaptation deployment module deploys and generates an countermeasure network by using the extracted high-yield rape variety data, adapts a quality evaluation model to a new variety feature space through migration learning, and completes the process of intelligently evaluating the quality of the new rape seeds, wherein the process comprises the following steps:
The method comprises the steps of taking extracted high-yield rape variety data as source domain data, inputting the source domain data into a preset countermeasure training sub-module to conduct quality assessment model pretraining so that the rape seed quality related general feature representation is learned, transferring the pretraining model from knowledge on a high-yield rape variety data set to a new variety rape seed quality assessment task by adopting a transfer learning strategy, and simultaneously, carrying out parameter adjustment and optimization on the quality assessment model according to the characteristics of the new variety data.
Compared with the prior art, the method has the advantages that the method integrates data from different channels and different structures by acquiring multi-source heterogeneous rape data and constructing a data pool, breaks data islands, provides comprehensive and rich data basis for subsequent analysis, facilitates excavation of more potential information, can intuitively present interaction relation between genes and environmental characteristics by constructing cross-domain feature interaction maps and utilizing graph neural network learning, provides more accurate basis for subsequent analysis by deeply excavating a gene-environment interaction mechanism, provides powerful support for the output update map, fully pays attention to importance of different dimensional features by combining a multi-branch multi-head attention architecture analysis map and a cross-mode attention mechanism, fuses multidimensional information and doubly optimizes to generate optimized comprehensive feature vectors, effectively improves quality and representativeness of the features, enhances processing capacity of the model on complex data, and can be directly applied to rape seed quality evaluation by constructing an evaluation model by decoding the optimized feature vectors and extracting high-product variety data, simultaneously provides important reference for subsequent new variety research, facilitates rapid screening, facilitates rapid development of varieties and has high-yield evaluation potential by using the intelligent network learning, and is suitable for rapid development of new varieties, and is suitable for rapid evaluation.
Drawings
FIG. 1 is a block diagram of a rape seed quality evaluation model based on big data.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a rape seed quality evaluation model based on big data includes:
the multi-source heterogeneous data acquisition and preprocessing module is used for acquiring multi-source heterogeneous rape data, processing and analyzing the acquired multi-source heterogeneous rape data to generate a multi-source heterogeneous rape data pool;
The cross-domain feature interaction map construction module is used for constructing a cross-domain feature interaction map based on a multi-source heterogeneous rape data pool, performing deep feature learning on the constructed cross-domain feature interaction map by using a graph neural network, analyzing a gene-environment interaction relationship and outputting an updated cross-domain feature interaction map;
Analyzing a cross-domain feature interaction map through a preset four-branch multi-head attention framework, generating double weights and fusing features by combining a cross-mode attention mechanism, and generating an optimized comprehensive feature vector fusing multi-dimensional information through double optimization after multi-head attention screening;
the feature evaluation analysis module is used for decoding and analyzing the optimized comprehensive feature vector so as to construct a rape seed quality evaluation model and extract high-yield rape variety data;
And the migration learning adaptation deployment module is used for generating an countermeasure network by utilizing the extracted high-yield rape variety data deployment, adapting the quality evaluation model to a new variety characteristic space through migration learning, and completing intelligent evaluation of the quality of the new rape seeds.
It should be further described that, in the specific implementation process, the multi-source heterogeneous data acquisition and preprocessing module acquires multi-source heterogeneous rape data, and the process of processing and analyzing the acquired multi-source heterogeneous rape data to generate the multi-source heterogeneous rape data pool is as follows:
Extracting rape whole genome SNP mark data from a gene database, acquiring field phenotype data through an Internet of things sensor network, accessing a weather station to monitor data flow in real time to acquire field environment parameters, and acquiring rape growth cycle dynamic image data by using an unmanned aerial vehicle carried with a multispectral camera, wherein the phenotype data acquired by the field sensor comprises branch numbers, yield and the like, and the field environment parameters comprise environment monitoring data such as soil humidity, temperature, illumination intensity and the like;
The method comprises the steps of uniformly storing all collected data sets into an HBase distributed database and preprocessing, namely removing repeated sites and low-quality sites of rape whole genome SNP marker data through data cleaning, keeping an effective SNP site set to obtain a genotype data set, analyzing time sequence records of field environment parameters, establishing an environment parameter time sequence index, calling unmanned aerial vehicle aerial image data, carrying out feature extraction on the dynamic image data by adopting a computer vision technology, marking image time stamps in segments according to growth periods, extracting vegetation indexes as primary phenotype features, and simultaneously carrying out time-space alignment on the collected field phenotype data and the extracted primary phenotype features, and eliminating sampling time deviation so as to obtain synergistic phenotype features;
Constructing an SNP-chromosome association matrix based on a genotype dataset to serve as a gene dimension basic frame of a rape original data pool; mapping the environmental parameter time sequence index and the collaborative phenotype characteristic to a gene dimension basic framework according to a time stamp, and establishing a cross-domain data association relation; dividing a data column group according to genotype, environment and phenotype by adopting a column group storage model of an HBase distributed database, and setting a compound coding rule of a row key; executing data integrity verification, verifying consistency of cross-domain data through a hash check code, and completing construction of a rape original data pool;
The method comprises the steps of carrying out standardization processing on rape original data pools to obtain rape standardization data pools, specifically, carrying out clustering analysis on the rape original data pools, adopting a DBSCAN algorithm to identify data outliers caused by extreme weather, eliminating abnormal rape data samples based on local density threshold values, filling the missing measurement values by adopting a time sequence interpolation algorithm according to the missing measurement values, carrying out Z-score standardization processing on the filled data, eliminating dimension differences, and generating the standardization data pools conforming to normal distribution.
It should be further described that, in the specific implementation process, the cross-domain feature interaction map construction module constructs a cross-domain feature interaction map based on the multi-source heterogeneous rape data pool, and uses the graph neural network to perform deep feature learning on the constructed cross-domain feature interaction map, analyze the gene-environment interaction relationship, and output the updated cross-domain feature interaction map as follows:
The method comprises the steps of separating genotype nodes (SNP locus sequences), environment nodes (time sequence data such as soil humidity and temperature), phenotype nodes (NDVI, plant height and the like) from a rape standardized data pool, performing independent thermal coding on each node to generate initial characteristic vectors serving as map initial characteristic representations, acquiring genotype, environment and phenotype data of 1000 samples from the rape standardized data pool, wherein the genotype data comprises 50 locus information, the environment data comprises 10 environment parameters such as temperature, humidity and illumination, the phenotype data comprises 8 phenotype indexes such as plant height and yield, the genotype nodes have 50 possible states and are coded into 50-dimensional vectors, and the environment nodes comprise 10 dimensions and 8 dimensions of the phenotype nodes to generate initial characteristic vectors;
after the initial characteristic representation of the node is completed, performing association analysis on the gene node and the environment node, namely calculating the mutual information value of the gene node and the environment nodeWherein the mutual information value is used to quantify the degree of association between the two node types, the formula is:
;
In the formula,In order to be a gene node,Is a set of states of the nodes of the gene,As a node of the environment,For a set of states of an environmental node,In order to combine the probabilities of the probability,Setting a mutual information threshold (such as 0.5) based on the calculated mutual information value (assumed to be 0.3) for screening out the node pairs with obvious association;
According to the constructed gene-environment binary edge set and the initial characteristic representation of the node, generating a static map, namely aggregating adjacent node characteristics through a graph convolution layer, enabling the adjacent node characteristics to capture information of adjacent nodes around the node, and fusing the captured information into the characteristic representation of the target node, so as to generate a node embedded representation; it can be understood that when constructing a static map, using a graph convolution layer to aggregate adjacent node characteristics, wherein the process involves information interaction between a plurality of nodes in a graph structure, selecting a node as a currently processed object in each step of graph convolution, wherein the node is a target node, for example, a gene-environment interaction map is assumed to comprise gene nodes j1, j2, j3 and environment nodes e1 and e2, when the gene node j1 is subjected to graph convolution operation, the gene node j1 is the target node, if the gene node j1 is connected with the environment nodes e1 and e2, the graph convolution layer captures characteristic information of the environment nodes e1 and e2 firstly, then fuses the characteristic information into characteristic representation of the gene node j1, the updated characteristic of the gene node j1 comprises initial characteristics of the node and characteristic information of the environment nodes e1 and e2, and further describes interaction conditions between the gene node j1 and the environment more accurately;
inputting the generated static map into a three-layer map attention network, and capturing time sequence changes of gene-environment edge weights by using a gating circulation unit, wherein each layer of map attention network adopts a multi-head attention mechanism (such as an 8-head attention mechanism), the multi-head attention mechanism learns attention relations among nodes from a plurality of different subspaces so as to more comprehensively capture interaction information among the nodes, in each layer, calculates attention coefficients among the nodes, wherein the attention coefficients are used for reflecting importance degrees and association strength among the nodes, updates the representation of the nodes according to the calculated attention coefficients so that the nodes fuse information of adjacent nodes, dynamically corrects the weight values of the gene-environment edge by using the environmental parameter change rate as a gating signal, and dynamically describes the dynamic interaction relation between the gene and the environment according to the change condition of the environmental parameters by using the gating circulation unit so that the edge weights in the map are updated in real time along with the change of the environment;
definition of Gene-Environment interaction Strength indexThe index is calculated by the edge weightMultiplying node embedded cosine similarityI.e.Based on the gene-environment interaction intensity index, screening strong association edges of N (such as 50) before an index value, wherein N is a preset positive integer, and the method is used for limiting the number of the strong association edges selected from a plurality of gene-environment edges after being sequenced according to the interaction intensity index value so as to focus the edges playing a key role in gene-environment interaction and lay a number range foundation for the follow-up accurate screening of key feature edges;
Introducing a graph contrast learning framework, further analyzing the strong correlation edges of N before the index value, performing disturbance operation on the static map, wherein the disturbance operation comprises the random modification of node characteristics and edge weights, and calculating the interaction strength change rate by comparing node representation differences of the maps before and after disturbance :
;
Setting a change rate threshold M, screening key feature edges with the change rate of interaction strength being greater than or equal to the change rate threshold M, integrating all the key feature edges obtained by screening into a static map to form a final dynamic map, namely an updated cross-domain feature interaction map;
For example, the temperature is changed by 10% and the humidity is changed by 5% within one month, the weight values of the gene-environment edges are dynamically corrected according to the change rates, the gene-environment interaction intensity indexes are defined, the strong correlation edges of the front 50 of the index values are screened, the static map is subjected to disturbance operation, such as randomly modifying 5% of node characteristics and 3% of edge weights, the interaction intensity change rate is calculated by comparing the node representation differences of the maps before and after disturbance, the change rate threshold M is set to be 0.1, 30 key characteristic edges with the change rate greater than or equal to 0.1 are screened out, and the 30 key characteristic edges are integrated into the static map to form the final dynamic map.
It should be further described that, in the specific implementation process, the multi-modal feature fusion and optimization module analyzes the cross-domain feature interaction map through a preset four-branch multi-head attention architecture, combines the cross-modal attention mechanism to generate dual weights and fusion features, and the process of generating the optimized comprehensive feature vector fusing multi-dimensional information through dual optimization after multi-head attention screening is as follows:
Inputting the constructed cross-domain characteristic interaction map into a preset four-branch multi-head attention sub-module, wherein the four-branch multi-head attention sub-module is subjected to unit division, and the four-branch multi-head attention sub-module specifically comprises a gene focal point head, an environment focal point head, a phenotype focal point head and an interaction focal point head, and respectively processes gene, environment, phenotype and gene-environment interaction characteristics;
After the unit division is completed, the interior of each focus head starts to independently perform internal calculation, including calculation inquiry, keys and value matrix, and simultaneously, a cross-modal attention mechanism is introduced between the gene and the environmental branches, namely, the cross-modal attention mechanism is used for performing dot product operation on the gene characteristics and the environmental characteristics to calculate the cross-correlation strength between the gene characteristics and the environmental characteristics;
Splicing the output results of four branches (a gene focus head, an environment focus head, a phenotype focus head and an interaction focus head), wherein the spliced feature vector comprises comprehensive information of genes, environments, phenotypes and gene-environment interactions, inputting the spliced feature vector into a multi-layer perceptron, further analyzing the input feature vector by the multi-layer perceptron (nonlinear transformation and feature extraction) to finally generate a four-dimensional weight matrix, and normalizing the four-dimensional weight matrix through a Softmax function to obtain gene-environment interaction weight and phenotype response weight, wherein the gene-environment interaction weight is used for reflecting the importance degree of the interaction between the genes and the environments, and the phenotype response weight reflects the sensitivity degree of the phenotype to the changes of the genes and the environments;
Performing weighted summation operation on genes, environments, phenotypes and gene-environment interaction characteristics based on the gene-environment interaction weight and the phenotype response weight to form an initial fusion vector;
The multi-head self-attention mechanism captures the association relation between elements in the initial fusion vector from a plurality of different subspaces, so as to calculate a global association estimation value, wherein the global association estimation value reflects the importance degree of each element in the vector in the whole vector; setting a global association threshold K (e.g. 0.8), and screening out a core feature subset with the global association estimated value greater than or equal to K by comparing the global association estimated value with the threshold K, wherein the core feature subset contains the most critical and important information in the initial fusion vector;
The method comprises the steps of setting a residual connection submodule, inputting characteristics of a cross-domain characteristic interaction map and a core characteristic subset obtained through screening into a convolutional neural network to generate a residual correction term, wherein the characteristics of the cross-domain characteristic interaction map comprise characteristic information of genes, environments, phenotypes and gene-environment interactions, performing non-negative matrix factorization operation on the residual correction term, namely decomposing the high-dimensional residual correction term into a low-dimensional potential factor matrix and a coefficient matrix, taking potential factors as potential characteristic modes in the residual correction term, understandably, on the gene characteristic dimension, the potential factors correspond to core components of gene expression modes in a gene regulation network, reflecting potential driving factors of key environment factor combinations or environment changes influencing rape traits, representing potential sources of phenotype variations or phenotype classification in the aspect of phenotype characteristics, reflecting potential mechanisms of the gene-environment interactions for the gene-environment interactions, extracting the potential factors as constituent units of optimizing comprehensive characteristic vectors, splicing the extracted potential factors with the previously screened core characteristics, and weighting through attention to generate final optimized comprehensive characteristic vectors.
It should be further described that, in the specific implementation process, the feature evaluation analysis module decodes and analyzes the optimized comprehensive feature vector, so as to construct a rape seed quality evaluation model, and the process of extracting high-yield rape variety data is as follows:
Decoupling the feature domain by adopting a sparse self-encoder, forcing the activation value of the hidden layer to realize sparsification by L1 regularization constraint, and separating out the gene feature sub-vectorEnvironmental feature sub-vectorPhenotypic feature sub-vectorInteraction feature sub-vectorWherein, R represents real space (i.e. the eigenvalue is composed of specific numerical values), d represents dimension of the corresponding eigenvector (i.e. parameter quantity describing gene, environment, phenotype or interaction characteristics), each subvector corresponds to an independent eigenvector, and represents a certain kind of semantically clear eigenvector in the original eigenvector;
Constructing a corresponding capsule processing unit for each decoupled characteristic domain (gene/environment/phenotype/interaction), forming a parallelized characteristic domain processing pipeline, namely determining information transfer weight among sub-vectors through an iterative routing algorithm, constructing a corresponding processing pipeline for each characteristic domain, setting initial routing coefficients, adjusting the routing coefficients according to the prediction consistency of a lower-layer capsule and a higher-layer capsule, constructing a corresponding processing pipeline for each characteristic domain, particularly, acquiring a gene pipeline by initializing a 1D-CNN convolution kernel, acquiring an environment pipeline by loading a pre-trained ResNet-18 weight, loading plant phenotype group priori knowledge by constructing a knowledge distillation sub-module, acquiring a phenotype pipeline by initializing a bilinear fusion matrixAcquiring a gene-environment interaction pipeline;
The method comprises the steps of carrying out weighted aggregation on the decoupled feature sub-vectors based on routing weights, carrying out layer-by-layer transmission in a capsule network, firstly, carrying out pairwise combination on capsules of adjacent feature domains (such as gene-environment and environment-phenotype) in a local feature fusion stage, refining a local interaction mode through dynamic routing, and then integrating all the feature domain capsules into a unified high-level representation in a global feature fusion stage to form a progressive feature refining path from local to global, namely, inputting gene feature sub-vectors G to 1D-CNN for gene feature refining, and outputting a primary gene modeSequence labeling is carried out through a CRF layer, a disease-resistant related SNP combination mode is identified, and a gene quality score is generatedWhereinIn order to activate the function,Is a weight matrix of the gene signature,For refining environmental characteristics, E is input into space-time transducer, and decomposed into static property Es (soil type, etc.) and dynamic time series Ed (precipitation/temperature sequence) by separable attention mechanism to generate environmental adaptability scoreME is a multi-layer decision processor (mining the rule of static environment characteristics through a multi-layer neural network), LE is a time sequence memory sub-module (processing environment dynamic data of temperature/precipitation and the like changing along with time), LE is a characteristic fusion operation (combining and analyzing static attribute and dynamic time sequence information), phenotype characteristics are refined, a cross-layer graph structure is constructed, nodes are phenotype characteristics, genetic relativity among phenotypes is achieved, information is transmitted through a GAT network, phenotype representation P' is updated, and a phenotype integrity score is generatedWhereinIn order for the attention to be weighted,Representing different phenotypic characteristic dimensions (such as specific character indexes of plant height, fruit number, thousand grain weight and the like), and performing bilinear pooling on interaction characteristic refining: Generating an interaction strength scoreWhere tan h is the bi-directional activation function,The transpose of the matrix is represented,For the interaction feature weight matrix,The reference correction parameters calculated for the interaction features (i.e. the interaction feature bias term),For the interaction characteristic matrix (comprising multidimensional interaction information of genes and environment) after bilinear pooling, F is a norm in the interaction intensity score;
setting a multi-scale feature fusion mechanism according to a progressive feature refining processing result, and finally outputting fusion features comprising two layers, namely local feature fusion and global feature fusion, wherein the local feature fusion is to retain interactive detail information through a weighted combination of adjacent feature domain capsules (such as gene-environment fusion capsules);
Carrying out quantitative analysis on local fusion features and global fusion features obtained under a multi-scale feature fusion mechanism by adopting a feature importance assessment method (such as feature importance calculation based on a tree model), and determining the contribution degree of each feature domain and combination thereof to rape variety marking;
According to the quantitative analysis result, screening out the characteristic fields with key effects on the high-yield rape variety labeling and the combinations thereof (such as certain gene locus combinations in gene characteristic subvectors, temperature and humidity combinations in environment characteristic subvectors and the like) as the standards of subsequent labeling and screening, and simultaneously eliminating the characteristics with smaller contribution or irrelevant characteristics, for example, extracting the high-yield rape variety data conforming to the key characteristics from the data to obtain 200 pieces of high-yield rape variety data, and finally generating the labeling result of each rape variety.
It should be further noted that, in the specific implementation process, the migration learning adaptation deployment module deploys and generates an countermeasure network by using the extracted high-yield rape variety data, and adapts the quality evaluation model to the new variety feature space through migration learning, so that the process of intelligently evaluating the quality of the new rape seeds is as follows:
The method comprises the steps of taking extracted high-yield rape variety data as source domain data, inputting the source domain data into a preset countermeasure training sub-module to conduct quality assessment model pretraining so that the rape seed quality related general feature representation is learned, transferring the pretraining model from knowledge on a high-yield rape variety data set to a new variety rape seed quality assessment task by adopting a transfer learning strategy, and simultaneously, carrying out parameter adjustment and optimization on the quality assessment model according to characteristics of the new variety data to ensure that the quality assessment model can accurately assess rape seed quality on the new variety data.
It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.
It should be understood that determining B from a does not mean determining B from a alone, but can also determine B from a and/or other information.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Finally, the foregoing description of the preferred embodiment of the invention is provided for the purpose of illustration only, and is not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the invention.

Claims (7)

The method comprises the steps of uniformly storing all collected data sets into an HBase distributed database and preprocessing, namely removing repeated sites and low-quality sites of rape whole genome SNP marker data through data cleaning, keeping an effective SNP site set to obtain a genotype data set, analyzing time sequence records of field environment parameters, establishing an environment parameter time sequence index, calling unmanned aerial vehicle aerial image data, carrying out feature extraction on the dynamic image data by adopting a computer vision technology, marking image time stamps in segments according to growth periods, extracting vegetation indexes as primary phenotype features, and simultaneously carrying out time-space alignment on the collected field phenotype data and the extracted primary phenotype features, and eliminating sampling time deviation so as to obtain synergistic phenotype features;
CN202510678901.4A2025-05-262025-05-26Rape seed quality evaluation system based on big dataActiveCN120196911B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510678901.4ACN120196911B (en)2025-05-262025-05-26Rape seed quality evaluation system based on big data

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510678901.4ACN120196911B (en)2025-05-262025-05-26Rape seed quality evaluation system based on big data

Publications (2)

Publication NumberPublication Date
CN120196911Atrue CN120196911A (en)2025-06-24
CN120196911B CN120196911B (en)2025-08-19

Family

ID=96065963

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510678901.4AActiveCN120196911B (en)2025-05-262025-05-26Rape seed quality evaluation system based on big data

Country Status (1)

CountryLink
CN (1)CN120196911B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120494211A (en)*2025-07-092025-08-15四川信特农牧科技有限公司 Feed processing prediction and real-time control method based on deep learning
CN120509606A (en)*2025-07-182025-08-19北京麦麦趣耕科技有限公司Rape variety environmental suitability evaluation system based on machine learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2023084543A1 (en)*2021-11-122023-05-19Waycool Foods And Products Private LimitedSystem and method for leveraging neural network based hybrid feature extraction model for grain quality analysis
CN118155721A (en)*2024-01-292024-06-07三亚热带水产研究院 A method and system for evaluating aquatic germplasm resources based on deep learning
CN119048812A (en)*2024-08-122024-11-29北华航天工业学院Corn seed quality assessment method based on deep learning network
CN119131604A (en)*2024-11-142024-12-13达州市农业科学研究院(达州市苎麻科学研究所、达州市薯类作物研究所) Ramie seed vitality evaluation method and system based on multispectral image analysis
CN119540231A (en)*2025-01-212025-02-28云南省林业和草原科学院 A method, device, equipment and storage medium for detecting seed quality

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2023084543A1 (en)*2021-11-122023-05-19Waycool Foods And Products Private LimitedSystem and method for leveraging neural network based hybrid feature extraction model for grain quality analysis
CN118155721A (en)*2024-01-292024-06-07三亚热带水产研究院 A method and system for evaluating aquatic germplasm resources based on deep learning
CN119048812A (en)*2024-08-122024-11-29北华航天工业学院Corn seed quality assessment method based on deep learning network
CN119131604A (en)*2024-11-142024-12-13达州市农业科学研究院(达州市苎麻科学研究所、达州市薯类作物研究所) Ramie seed vitality evaluation method and system based on multispectral image analysis
CN119540231A (en)*2025-01-212025-02-28云南省林业和草原科学院 A method, device, equipment and storage medium for detecting seed quality

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120494211A (en)*2025-07-092025-08-15四川信特农牧科技有限公司 Feed processing prediction and real-time control method based on deep learning
CN120509606A (en)*2025-07-182025-08-19北京麦麦趣耕科技有限公司Rape variety environmental suitability evaluation system based on machine learning

Also Published As

Publication numberPublication date
CN120196911B (en)2025-08-19

Similar Documents

PublicationPublication DateTitle
Esmaeili et al.Hyperspectral image band selection based on CNN embedded GA (CNNeGA)
Kamilaris et al.Deep learning in agriculture: A survey
CN120196911B (en)Rape seed quality evaluation system based on big data
CN112347970B (en)Remote sensing image ground object identification method based on graph convolution neural network
Su et al.LodgeNet: Improved rice lodging recognition using semantic segmentation of UAV high-resolution remote sensing images
US11790410B2 (en)System and method for natural capital measurement
CN116206158B (en) Scene image classification method and system based on dual hypergraph neural network
CN118709789A (en) Crop growth prediction method and system based on artificial intelligence and crop growth model
Bi et al.A transformer-based approach for early prediction of soybean yield using time-series images
Krishna et al.Fuzzy-twin proximal SVM kernel-based deep learning neural network model for hyperspectral image classification
CN117314006A (en)Intelligent data analysis method and system
Guo et al.Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning
CN119762896B (en)Multi-network fusion hyperspectral image classification method with anti-noise performance
Pazhanikumar et al.Remote sensing image classification using modified random forest with empirical loss function through crowd-sourced data
CN119599946A (en) A road crack detection method, device and electronic equipment
CN114881155B (en)Fruit image classification method based on deep migration learning
BaidarRice crop classification and yield estimation using multi-temporal Sentinel-2 data: a case study of terai districts of Nepal
Ding et al.Land-use classification with remote sensing image based on stacked autoencoder
Li et al.Early drought plant stress detection with bi-directional long-term memory networks
AnithaAdaptation of XAI for smart agriculture systems
Batista et al.Evolutionary machine learning in environmental science
CN119516264B (en) Remote sensing small sample image classification method based on two-stage fine-tuning multimodal model
CN120429416B (en) Cross-modal knowledge acquisition and intelligent question-answering methods and equipment for food production
CN119760488B (en)Part quality grade classification method, device, equipment, medium and product based on uncertain data pattern classification
CN115424138B (en) A hyperspectral image classification method based on deep neural network

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp