CN116952885A

Movatterモバイル変換

Info

Publication number: CN116952885A
Application number: CN202310684993.8A
Authority: CN
Inventors: 付贤树; 白雪; 孙敏; 吴诗雯; 张明洲; 俞晓平; 叶子弘; 邱雨楼
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2023-06-09
Filing date: 2023-06-09
Publication date: 2023-10-27

Abstract

The application relates to a tea tracing method and a system based on a coupling relation between a tea producing area and a time factor. The identification model solves the problem that the traditional detection method is limited by the coupling relation between the tea producing area and the time factor, and the identification accuracy is high.

Description

Translated fromChinese

基于茶叶产地与时间因素耦合关系的茶叶溯源方法及系统Tea traceability method and system based on the coupling relationship between tea origin and time factors

技术领域Technical field

本申请涉及质量检测技术领域，特别是涉及一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法及系统。This application relates to the field of quality inspection technology, and in particular to a tea traceability method and system based on the coupling relationship between tea origin and time factors.

背景技术Background technique

各个正品茶叶生产、销售和检测机构急需改变目前不利状况。然而，目前茶叶品质鉴定技术主要包括光谱法、色谱法、稳定同位素标记法、多矿物元素检测法与滋味因子法、质谱法等。光谱法、色谱法、稳定同位素标记法、多矿物元素检测法鉴别方法费时费力且设备成本较高，不利于大面积推广使用。滋味因子法或质谱法对于同一批次或同一年份的茶叶，在建立模型后准确度较高，但是对于不同批次或不同年份的茶叶鉴别的准确度较低。为了解决滋味因子法或质谱法受限于茶叶产地与时间因素耦合关系，鉴别茶叶产地准确度较低的缺陷，本申请提出一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法。All genuine tea production, sales and testing institutions urgently need to change the current unfavorable situation. However, the current tea quality identification technologies mainly include spectroscopy, chromatography, stable isotope labeling, multi-mineral element detection and taste factor methods, mass spectrometry, etc. Identification methods such as spectroscopy, chromatography, stable isotope labeling, and multi-mineral element detection are time-consuming and labor-intensive and have high equipment costs, which are not conducive to large-scale promotion and use. The taste factor method or mass spectrometry method has higher accuracy for tea leaves from the same batch or the same year after establishing a model, but has lower accuracy for identifying tea leaves from different batches or different years. In order to solve the problem that the taste factor method or mass spectrometry method is limited by the coupling relationship between the tea origin and the time factor, and has low accuracy in identifying the tea origin, this application proposes a tea traceability method based on the coupling relationship between the tea origin and the time factor.

发明内容Contents of the invention

为了解决滋味因子法或质谱法受限于茶叶产地与时间因素耦合关系，鉴别茶叶产地准确度较低的缺陷，本申请提出一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法。In order to solve the problem that the taste factor method or mass spectrometry method is limited by the coupling relationship between the tea origin and the time factor, and has low accuracy in identifying the tea origin, this application proposes a tea traceability method based on the coupling relationship between the tea origin and the time factor.

本申请提供一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法，包括：This application provides a tea traceability method based on the coupling relationship between tea origin and time factors, including:

选取一个产地；Select a place of origin;

接收同一个时间段内的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据；Receive near-infrared characteristic spectral data of tea samples from multiple tea samples from the same origin within the same time period;

对每一份茶叶样品的茶叶样品近红外特征光谱数据分别使用不同的数据处理算法进行去噪处理；Use different data processing algorithms to denoise the near-infrared characteristic spectral data of each tea sample;

提取每一份去噪后的茶叶样品近红外特征光谱图中的特征光谱信号；Extract the characteristic spectral signal in the near-infrared characteristic spectrum of each denoised tea sample;

接收另一个时间段的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据，返回所述对每一份茶叶样品的茶叶样品近红外特征光谱数据分别使用不同的数据处理算法进行去噪处理，直至该产地的所有时间段内的茶叶样品近红外特征光谱图中的特征光谱信号均被采集；Receive tea sample near-infrared characteristic spectral data of multiple tea samples from the same origin in another time period, and return the tea sample near-infrared characteristic spectral data of each tea sample using different data processing algorithms for denoising processing. , until the characteristic spectral signals in the near-infrared characteristic spectra of tea samples in all time periods from this origin are collected;

切换另一个产地，返回所述接收同一个时间段内的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据，直至各个不同产地的各个时间段内的茶叶样品近红外特征光谱图中的特征光谱信号被采集；Switch to another place of origin and return the near-infrared characteristic spectral data of tea samples from multiple tea samples from the same place of origin within the same time period until Characteristic spectral signals are collected;

对各个产地的各个时间段内的使用不同的数据处理算法的茶叶样品近红外特征光谱图中特征光谱信号分别进行分析，求每一种数据处理算法下的同一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系；Analyze the characteristic spectral signals in the near-infrared characteristic spectra of tea samples in each time period from each origin using different data processing algorithms, and find the characteristics of tea samples in each time period from the same origin under each data processing algorithm. The coupling relationship between the characteristic spectral signals and origin factors in the near-infrared characteristic spectrum;

利用茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系，在全卷积神经网络深度学习算法下建立基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型；Utilizing the coupling relationship between the characteristic spectral signals in the near-infrared characteristic spectra of tea samples and origin factors, an identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples was established under the fully convolutional neural network deep learning algorithm;

将一个产地的不同时间段茶叶分梯度融合入该原产地年份为0年的茶叶样本池，对每一个原产地茶叶建立样本池，利用原产地茶叶样本池对基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型进行训练；The tea leaves from different time periods of an origin are gradiently merged into the tea sample pool of the origin year 0, and a sample pool is established for each tea of origin. The tea sample pool of the origin is used to compare the near-infrared characteristic spectra of the tea samples. The identification model of characteristic spectral signals is trained;

对比每一种数据处理算法下的同一产地的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型的准确度，选择准确度最高的数据处理算法下的同一产地的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型为最终鉴别模型；Compare the accuracy of the identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples from the same origin under each data processing algorithm, and select the most accurate data processing algorithm based on the near-infrared characteristics of tea samples from the same origin. The identification model of the characteristic spectral signal in the spectrogram is the final identification model;

接收待检测的茶叶样本的近红外特征光谱数据；Receive the near-infrared characteristic spectrum data of the tea sample to be detected;

将待检测的茶叶样本的近红外特征光谱数据导入最终鉴别模型，并输出检测的茶叶样本关于各个原产地可信度数值比例表。Import the near-infrared characteristic spectral data of the tea samples to be detected into the final identification model, and output a numerical ratio table of the credibility of each origin of the tea samples tested.

本申请提供一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的系统，包括：This application provides a system for tea traceability methods based on the coupling relationship between tea origin and time factors, including:

上位机，用于执行基于茶叶产地与时间因素耦合关系的茶叶溯源方法；The host computer is used to implement the tea traceability method based on the coupling relationship between tea origin and time factors;

近红外检测仪，用于扫描茶叶样品，获取茶叶样品近红外特征光谱数据，且与所述上位机通信连接。A near-infrared detector is used to scan tea samples, obtain near-infrared characteristic spectral data of tea samples, and is connected to the host computer for communication.

本申请涉及一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法及系统，通过接收茶叶样品近红外特征光谱数据，对每一份接收到的光谱数据分别采取不同的数据处理算法进行去噪处理和融合处理，选择不同的数据处理算法得到茶叶样品近红外特征光谱图，对茶叶样品近红外特征光谱图获取特征光谱信号，利用获取的特征光谱信号，求取对应数据处理算法下同一产地的各个时间段内茶叶样品的近红外特征光谱数据图像中特征光谱信号与原产地因素耦合关系，基于耦合关系在全卷积神经网络深度学习算法下建立基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型。该鉴别模型解决了传统检测方法受限于茶叶产地与时间因素耦合关系，并且鉴别茶叶产地准确度较高。This application relates to a tea traceability method and system based on the coupling relationship between tea origin and time factors. By receiving near-infrared characteristic spectral data of tea samples, different data processing algorithms are used to denoise each received spectral data. and fusion processing, select different data processing algorithms to obtain the near-infrared characteristic spectrum of the tea sample, obtain the characteristic spectral signal from the near-infrared characteristic spectrum of the tea sample, and use the obtained characteristic spectral signal to obtain the corresponding data processing algorithm for each of the same origin. The coupling relationship between the characteristic spectral signal and the origin factor in the near-infrared characteristic spectral data image of the tea sample within the time period is established based on the coupling relationship under the fully convolutional neural network deep learning algorithm. The characteristic spectral signal in the near-infrared characteristic spectral data image of the tea sample is established based on the coupling relationship. identification model. This identification model solves the problem that traditional detection methods are limited by the coupling relationship between tea origin and time factors, and has high accuracy in identifying tea origin.

附图说明Description of the drawings

构成本申请的一部分的附图用来提供对本申请的进一步理解，使得本申请的其它特征、目的和优点变得更明显。本申请的示意性实施例附图及其说明用于解释本申请，并不构成对本申请的不当限定。The accompanying drawings, which constitute a part of this application, are included to provide a further understanding of the application so that other features, objects and advantages of the application will become apparent. The drawings and descriptions of the schematic embodiments of the present application are used to explain the present application and do not constitute an improper limitation of the present application.

图1为本申请一实施例提供的一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的流程图。Figure 1 is a flow chart of a tea origin tracing method based on the coupling relationship between tea origin and time factors provided by an embodiment of the present application.

图2为本申请一实施例提供的一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的系统的模块连接图。Figure 2 is a module connection diagram of a system of a tea traceability method based on the coupling relationship between tea origin and time factors provided by an embodiment of the present application.

附图标记：Reference signs:

100-上位机；200-近红外检测仪。100-Host computer; 200-Near infrared detector.

具体实施方式Detailed ways

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the present application will be further described in detail below with reference to the drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application and are not used to limit the present application.

本申请提供一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法及系统。This application provides a tea traceability method and system based on the coupling relationship between tea origin and time factors.

如图1所示，在本申请的一实施例中，一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法，包括：As shown in Figure 1, in one embodiment of the present application, a tea traceability method based on the coupling relationship between tea origin and time factors includes:

S100，选取一个产地。S100, select an origin.

S200，接收同一个时间段内的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据，对每一份茶叶样品的茶叶样品近红外特征光谱数据分别使用不同的数据处理算法进行去噪处理，提取每一份去噪后的茶叶样品近红外特征光谱图中的特征光谱信号。S200, receive tea sample near-infrared characteristic spectral data of multiple tea samples from the same origin within the same time period, and use different data processing algorithms to denoise the tea sample near-infrared characteristic spectral data of each tea sample. , extract the characteristic spectral signal in the near-infrared characteristic spectrum of each denoised tea sample.

S300，接收另一个时间段的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据，返回所述对每一份茶叶样品的茶叶样品近红外特征光谱数据分别使用不同的数据处理算法进行去噪处理，直至该产地的所有时间段内的茶叶样品近红外特征光谱图中的特征光谱信号均被采集。S300: Receive tea sample near-infrared characteristic spectral data of multiple tea samples from the same origin in another time period, and return the tea sample near-infrared characteristic spectral data of each tea sample using different data processing algorithms. Noise processing is performed until the characteristic spectral signals in the near-infrared characteristic spectra of tea samples in all time periods from this origin are collected.

S400，切换另一个产地，返回所述接收同一个时间段内的同一产地的多份茶叶样品的茶叶样品近红外特征光谱数据，直至各个不同产地的各个时间段内的茶叶样品近红外特征光谱图中的特征光谱信号被采集。S400, switch to another origin, and return the near-infrared characteristic spectral data of tea samples from multiple tea samples from the same origin within the same time period, until the near-infrared characteristic spectral data of tea samples in different time periods from different origins. The characteristic spectral signals in are collected.

S500，对各个产地的各个时间段内的使用不同的数据处理算法的茶叶样品近红外特征光谱图中特征光谱信号分别进行分析，求每一种数据处理算法下的同一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系。S500, analyze the characteristic spectral signals in the near-infrared characteristic spectra of tea samples using different data processing algorithms in each time period of each origin, and obtain the tea leaves in each time period of the same origin under each data processing algorithm. The coupling relationship between the characteristic spectral signal and the origin factors in the near-infrared characteristic spectrum of the sample.

S600，利用茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系，在全卷积神经网络深度学习算法下建立基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型。S600, using the coupling relationship between the characteristic spectral signals in the near-infrared characteristic spectra of tea samples and origin factors, establish an identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples under the fully convolutional neural network deep learning algorithm.

S700，将一个产地的不同时间段茶叶分梯度融合入该原产地年份为0年的茶叶样本池，对每一个原产地茶叶建立样本池，利用原产地茶叶样本池对基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型进行训练。S700, merge the tea leaves from different time periods of an origin into the tea sample pool of year 0 of the origin, establish a sample pool for each tea of origin, and use the tea sample pool of the origin to compare the near-infrared characteristic spectra of the tea samples. The identification model of the characteristic spectral signals in the image is trained.

S800，对比每一种数据处理算法下的同一产地的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型的准确度，选择准确度最高的数据处理算法下的同一产地的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型为最终鉴别模型。S800, compare the accuracy of the identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples from the same origin under each data processing algorithm, and select the most accurate data processing algorithm based on the near-infrared characteristics of tea samples from the same origin. The identification model of the characteristic spectral signal in the infrared characteristic spectrum is the final identification model.

S900，接收待检测的茶叶样本的近红外特征光谱数据，将待检测的茶叶样本的近红外特征光谱数据导入最终鉴别模型，并输出检测的茶叶样本关于各个原产地可信度数值比例表。S900: Receive the near-infrared characteristic spectrum data of the tea sample to be detected, import the near-infrared characteristic spectrum data of the tea sample to be detected into the final identification model, and output the numerical ratio table of the credibility of each origin of the detected tea sample.

具体的，本申请提出基于模型迭代与深度学习的茶叶产地因素与时间因素耦合关系研究。在本实施例中，以浙江龙井和武夷岩茶为研究对象，采集来自多个茶园的不同采摘时间的不同原产地茶叶近红外光谱，比较Savitzky-Golay卷积平滑(SG)、多元散射校正(MSC)、标准正态变换(SNV)、二阶导数(SD)、SG+MSC(SG-MSC)、SG+SNV(SG-SNV)、SD+MSC(SD-MSC)和SD+SNV(SD-SNV)8种基于单一和融合模式的前处理算法性能，确定较佳前处理算法，分析不同原产地茶叶响应信号差异，提取浙江龙井和武夷岩茶光谱特征信息，研究采摘时间因素影响下同一原产地茶叶特征信号变化规律，解析采摘时间因素与原产地因素耦合关系。系统性的揭示浙江龙井和武夷岩茶特征信号与原产地之间关系；利用卷积神经网络等深度学习算法训练浙江龙井和武夷岩茶光谱特征信号，建立基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型。Specifically, this application proposes a study on the coupling relationship between tea origin factors and time factors based on model iteration and deep learning. In this example, Zhejiang Longjing and Wuyi rock tea were used as the research objects. Near-infrared spectra of tea leaves from different origins at different picking times from multiple tea gardens were collected, and the Savitzky-Golay convolution smoothing (SG) and multivariate scattering correction (SG) were compared. MSC), standard normal transformation (SNV), second derivative (SD), SG+MSC(SG-MSC), SG+SNV(SG-SNV), SD+MSC(SD-MSC) and SD+SNV(SD -SNV) 8 pre-processing algorithm performance based on single and fusion modes, determine the better pre-processing algorithm, analyze the difference in response signals of tea leaves from different origins, extract spectral characteristic information of Zhejiang Longjing and Wuyi rock tea, and study the same process under the influence of picking time factors The changing pattern of characteristic signals of tea from the place of origin, and the coupling relationship between picking time factors and place of origin factors are analyzed. Systematically reveal the relationship between the characteristic signals of Zhejiang Longjing and Wuyi rock tea and their place of origin; use deep learning algorithms such as convolutional neural networks to train the spectral characteristic signals of Zhejiang Longjing and Wuyi rock tea, and establish a near-infrared characteristic spectral data image based on tea samples Identification model of characteristic spectral signals.

值得一提的是，浙江龙井和武夷岩茶在茶的种类上属于截然不同的种类，从外观上就能区分。本申请的主要目的有两个：It is worth mentioning that Zhejiang Longjing and Wuyi rock tea are completely different types of tea, and they can be distinguished by their appearance. The main purposes of this application are twofold:

1.如何鉴别同种茶类下，不同的产地的茶叶成品：比如某地区存在多个种植地的绿茶的品类下，这些种植地的绿茶传统方法难以进行甄别，但是某个绿茶种植地的附带价值更高，如何鉴别该绿茶种植地的绿茶成品与其他绿茶种植地的绿茶成品就是本申请的主要内容之一。1. How to identify finished tea products of the same type of tea from different origins: For example, there are green tea categories from multiple planting areas in a certain area. It is difficult to identify the green tea from these planting areas by traditional methods, but the green tea from a certain green tea planting area is incidental. The value is higher. How to identify the finished green tea products from this green tea growing area and the finished green tea products from other green tea growing areas is one of the main contents of this application.

更具体的：利用基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型中多个“同一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系”对待测样本的近红外特征光谱进行对比。如待测样本的近红外特征光谱是否符合“第一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系”，若不符合，则判断是否符合“第二产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系”，依次类推。More specifically: using the identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples, the coupling relationship between the characteristic spectral signals in the near-infrared characteristic spectra of tea samples in various time periods from the same origin and the origin factors "Compare the near-infrared characteristic spectra of the samples to be tested. For example, whether the near-infrared characteristic spectrum of the sample to be tested conforms to the "coupling relationship between the characteristic spectral signal and the origin factors in the near-infrared characteristic spectrum of the tea sample in each time period in the first place of origin", if not, judge whether it meets the " The coupling relationship between the characteristic spectral signals and origin factors in the near-infrared characteristic spectra of tea samples in various time periods in the second origin", and so on.

2.如何鉴别同一茶叶产地下，茶叶的不同年份。比如以武夷岩茶为代表的半发酵茶或发酵茶。这类茶叶年份越久，价值越高。这是本申请的主要内容之二。2. How to identify tea leaves from the same origin and different years. For example, semi-fermented tea or fermented tea represented by Wuyi rock tea. The older this type of tea is, the higher its value is. This is the second main content of this application.

更具体的：利用基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型，已确定了某个“同一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系”，将待测样本的近红外特征光谱进行与该“同一产地的各个时间段内茶叶样品的近红外特征光谱图中特征光谱信号与原产地因素的耦合关系”进行耦合关联，确定该待测样本处于那个时间段的可信度。并将结果显示于关于各个原产地可信度数值比例表。比如该产地一年份的可信度为20％，该产地三年份的可信度为90％，该产地五年份的可信度为30％，那么基本可以确定该待测样本为3年份茶叶。More specifically: Using an identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples, the characteristic spectral signals and origin factors in the near-infrared characteristic spectra of tea samples in various time periods from the same origin have been determined "Coupling relationship", the near-infrared characteristic spectrum of the sample to be tested is coupled with the "coupling relationship between the characteristic spectral signal and the origin factors in the near-infrared characteristic spectrum of the tea sample in the same origin in various time periods" to determine The credibility of the sample to be tested in that time period. And the results are displayed in a numerical scale table regarding the credibility of each origin. For example, the credibility of one year of production is 20%, the credibility of three years of production is 90%, and the credibility of five years of production is 30%, then it can basically be determined that the sample to be tested is tea of three years.

本实施例涉及一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法及系统，通过接收茶叶样品近红外特征光谱数据，对每一份接收到的光谱数据分别采取不同的数据处理算法进行去噪处理和融合处理，选择不同的数据处理算法得到茶叶样品近红外特征光谱图，对茶叶样品近红外特征光谱图获取特征光谱信号，利用获取的特征光谱信号，求取对应数据处理算法下同一产地的各个时间段内茶叶样品的近红外特征光谱数据图像中特征光谱信号与原产地因素耦合关系，基于耦合关系在全卷积神经网络深度学习算法下建立基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型。该鉴别模型解决了传统检测方法受限于茶叶产地与时间因素耦合关系，并且鉴别茶叶产地准确度较高。This embodiment relates to a tea traceability method and system based on the coupling relationship between tea origin and time factors. By receiving near-infrared characteristic spectral data of tea samples, different data processing algorithms are used to denoise each received spectral data. Processing and fusion processing, select different data processing algorithms to obtain the near-infrared characteristic spectra of tea samples, obtain characteristic spectral signals from the near-infrared characteristic spectra of tea samples, and use the obtained characteristic spectral signals to obtain the characteristics of the same origin under the corresponding data processing algorithm The coupling relationship between the characteristic spectral signals and origin factors in the near-infrared characteristic spectral data images of tea samples in each time period is established based on the coupling relationship under the fully convolutional neural network deep learning algorithm. The characteristic spectra in the near-infrared characteristic spectral data images of tea samples are established based on the coupling relationship. Signal identification model. This identification model solves the problem that traditional detection methods are limited by the coupling relationship between tea origin and time factors, and has high accuracy in identifying tea origin.

在本申请的一实施例中，S200步骤，包括：In an embodiment of this application, step S200 includes:

S211，将每一份茶叶样品近红外特征光谱数据映射为一个茶叶样品近红外特征光谱图。S211. Map the near-infrared characteristic spectral data of each tea sample into a near-infrared characteristic spectrum of the tea sample.

S212，利用Savitzky-Golay卷积平滑算法对茶叶样品近红外特征光谱图进行平滑处理，获得第一图像。S212, use the Savitzky-Golay convolution smoothing algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain the first image.

S213，利用多元散射校正算法对茶叶样品近红外特征光谱图进行平滑处理，获得第二图像。S213, use a multivariate scattering correction algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain a second image.

S214，利用标准正态变换算法对茶叶样品近红外特征光谱图进行平滑处理，获得第三图像。S214, use the standard normal transformation algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain a third image.

S215，利用二阶导数算法对茶叶样品近红外特征光谱图进行平滑处理，获得第四图像。S215, use the second-order derivative algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain the fourth image.

本实施例涉及对样品近红外特征光谱数据处理的数据处理算法选择。本实施例以浙江龙井和武夷岩茶为研究对象，采集来自多个茶园的不同采摘时间的不同原产地茶叶近红外光谱，并对每一份茶叶样品近红外特征光谱数据映射的茶叶样品近红外特征光谱图，利用Savitzky-Golay卷积平滑(SG)、多元散射校正(MSC)、标准正态变换(SNV)、二阶导数(SD)进行去噪处理，获得对应的图像。由于不同的样本数据集特性所适用的前处理算法不同，并不存在某种较佳前处理算法对每次实验数据效果都较好，且基于单一前处理算法与融合前处理算法的效果不同，所以对每一份样品进行不同的数据处理能够提高样品的近红外特征光谱数据图像中特征光谱信号与原产地-时间因素耦合关系的拟合度。This embodiment involves the selection of data processing algorithms for processing near-infrared characteristic spectral data of samples. This example takes Zhejiang Longjing and Wuyi rock tea as the research objects, collects near-infrared spectra of tea leaves from different origins at different picking times from multiple tea gardens, and maps the near-infrared characteristic spectral data of each tea sample to the tea sample’s near-infrared The characteristic spectral image is denoised using Savitzky-Golay convolution smoothing (SG), multivariate scattering correction (MSC), standard normal transformation (SNV), and second-order derivative (SD) to obtain the corresponding image. Due to the different pre-processing algorithms applicable to different sample data set characteristics, there is no certain better pre-processing algorithm that has better effects on every experimental data, and the effects of a single pre-processing algorithm and a fusion pre-processing algorithm are different. Therefore, performing different data processing on each sample can improve the fitting degree of the coupling relationship between the characteristic spectral signal and the origin-time factor in the near-infrared characteristic spectral data image of the sample.

在本申请的一实施例中，S200步骤，还包括：In an embodiment of the present application, step S200 also includes:

S221，利用Savitzky-Golay卷积平滑和多元散射校正算法对茶叶样品近红外特征光谱图进行平滑处理，获得第五图像。S221, use Savitzky-Golay convolution smoothing and multivariate scattering correction algorithms to smooth the near-infrared characteristic spectrum of the tea sample to obtain the fifth image.

S222，利用Savitzky-Golay卷积平滑和标准正态变换算法对茶叶样品近红外特征光谱图进行平滑处理，获得第六图像。S222, use Savitzky-Golay convolution smoothing and standard normal transformation algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain the sixth image.

S223，利用二阶导数和多元散射校正算法对茶叶样品近红外特征光谱图进行平滑处理，获得第七图像。S223, use the second-order derivative and multivariate scattering correction algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain the seventh image.

S224，利用二阶导数和标准正态变换算法对茶叶样品近红外特征光谱图进行平滑处理，获得第八图像。S224, use the second-order derivative and standard normal transformation algorithm to smooth the near-infrared characteristic spectrum of the tea sample to obtain the eighth image.

本实施例涉及对样品近红外特征光谱数据处理的数据处理算法选择。本实施例以浙江龙井和武夷岩茶为研究对象，采集来自多个茶园的不同采摘时间的不同原产地茶叶近红外光谱，并对每一份茶叶样品近红外特征光谱数据映射的茶叶样品近红外特征光谱图，利用卷积平滑-多元散射校正SG+MSC(SG-MSC)、卷积平滑-标准正态变换SG+SNV(SG-SNV)、二阶导数-多元散射校正SD+MSC(SD-MSC)和二阶导数-标准正态变换SD+SNV(SD-SNV)进行去融合-去噪处理，获得对应的图像。由于不同的样本数据集特性所适用的前处理算法不同，并不存在某种较佳前处理算法对每次实验数据效果都较好，且基于单一前处理算法与融合前处理算法的效果不同，所以对每一份样品进行不同的数据处理能够提高样品的近红外特征光谱数据图像中特征光谱信号与原产地因素-时间耦合关系的拟合度。This embodiment involves the selection of data processing algorithms for processing near-infrared characteristic spectral data of samples. This example takes Zhejiang Longjing and Wuyi rock tea as the research objects, collects near-infrared spectra of tea leaves from different origins at different picking times from multiple tea gardens, and maps the near-infrared characteristic spectral data of each tea sample to the tea sample’s near-infrared Characteristic spectral diagram, using convolution smoothing-multiple scattering correction SG+MSC (SG-MSC), convolution smoothing-standard normal transformation SG+SNV (SG-SNV), second-order derivative-multiple scattering correction SD+MSC (SD -MSC) and second-order derivative-standard normal transformation SD+SNV (SD-SNV) for defusion-denoising processing to obtain the corresponding image. Due to the different pre-processing algorithms applicable to different sample data set characteristics, there is no certain better pre-processing algorithm that has better effects on every experimental data, and the effects of a single pre-processing algorithm and a fusion pre-processing algorithm are different. Therefore, different data processing for each sample can improve the fitting degree of the characteristic spectral signal and the origin factor-time coupling relationship in the near-infrared characteristic spectral data image of the sample.

S231，获取计算利用各个时间段内的同一产地的每一份茶叶样品的第一图像、第二图像、第三图像、第四图像、第五图像、第六图像、第七图像和第八图像建立的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型的准确度。S231, obtain and calculate the first image, the second image, the third image, the fourth image, the fifth image, the sixth image, the seventh image and the eighth image of each tea sample from the same origin in each time period. The accuracy of the established identification model based on the characteristic spectral signals in the near-infrared characteristic spectra of tea samples.

S232，将各个时间段内的同一产地的多份茶叶样品的第一图像、第二图像、第三图像、第四图像、第五图像、第六图像、第七图像和第八图像的准确度数值依次导入半监督算法，分别获得第一均值、第二均值、第三均值、第四均值、第五均值、第六均值、第七均值和第八均值。S232, compare the accuracy of the first image, the second image, the third image, the fourth image, the fifth image, the sixth image, the seventh image and the eighth image of multiple tea samples from the same origin in each time period. The values are introduced into the semi-supervised algorithm in turn, and the first mean, second mean, third mean, fourth mean, fifth mean, sixth mean, seventh mean and eighth mean are obtained respectively.

S233，取第一均值、第二均值、第三均值、第四均值、第五均值、第六均值、第七均值和第八均值中数值最高的数据处理算法所对应的基于茶叶样品近红外特征光谱图中特征光谱信号的鉴别模型。S233, obtain the near-infrared characteristics of the tea sample corresponding to the data processing algorithm with the highest value among the first mean, the second mean, the third mean, the fourth mean, the fifth mean, the sixth mean, the seventh mean and the eighth mean. Identification model of characteristic spectral signals in spectrograms.

具体的，一对一(OAO)的分解策略是将K类问题划分为K(K-1)/2个并行二元分类器，这些分类器更注重对一个类的评价而不是对另一个类的评价。结合分解的并行K(K-1)/2分类器，计算每个类的投票数，用最大胜投票策略预测新对象的类：Specifically, the one-to-one (OAO) decomposition strategy is to divide K-type problems into K(K-1)/2 parallel binary classifiers. These classifiers pay more attention to the evaluation of one class rather than the other class. evaluation of. Combined with the decomposed parallel K(K-1)/2 classifier, calculate the number of votes for each class, and use the maximum winning voting strategy to predict the class of the new object:

OAA分解策略把K类问题分解成个k个二分类问题，每一个二分类问题可以由一个二分类器来解决，二分类器的训练使用了全部的训练数据，并且由以某一类别数据为正类，其他类别的数据为负类的数据训练得到。在验证阶段，假设某个分类器是以第i类为正类，其他类为负类训练得到，则对于一个新样本X，该分类器会输出一个结果来表示新样本数据类的可信度。最终可以获得一个得分向量：p＝(p1,p2,p3,......,pi,......,pk)。The OAA decomposition strategy decomposes K class problems into k binary classification problems. Each binary classification problem can be solved by a binary classifier. The training of the binary classifier uses all the training data, and is based on a certain category of data. The data of the positive class and the data of other classes are obtained by training with the data of the negative class. In the verification phase, assuming that a classifier is trained with the i-th class as the positive class and other classes as negative classes, then for a new sample X, the classifier will output a result to represent the credibility of the new sample data class. . Finally, a score vector can be obtained: p=(p1, p2, p3,...,pi,...,pk).

所提出的穷举平行的半对半(EPHAH)分解方法将K类问题穷举分解成由两类分类划分策略诱导的并行排列的两类分类器。对K类进行所有一致两类分类划分的大小等于二项式系数C(K，K/2)。需要注意的是，当K是偶数时，只有C(K，K/2)/2需要建立并行两类分类器，因为另一半分类器是重复的。The proposed exhaustive parallel half-half (EPHAH) decomposition method exhaustively decomposes K-class problems into two classes of classifiers arranged in parallel induced by a two-class classification partitioning strategy. The size of all consistent two-class classification partitions for K class is equal to the binomial coefficient C(K,K/2). It should be noted that when K is an even number, only C(K, K/2)/2 needs to build parallel two-class classifiers because the other half of the classifiers are repeated.

对于要分析的K组，对于OAO，每对K组之间建立两类分类器，因此总共有K(K-1)/2个并行OAO分类器。所有K(K-1)/2模型均采用多数表决原则对测试对象进行预测并分配。对于OAA,OAA构建了K个两类(one-against-(K-1))模型。对于j(j＝1,2,3…，K)模型，第j组赋值为+1，其他(K-1)组记为-1。预测对象将从上述K个OAA模型中获得K个预测响应向量，并根据最大响应值将其分组到相应的类别中。而对于EPHAH，将k类问题分解为C(k,k/2)并行两类分类器，并采用最大获胜投票策略将新样本分配到投票权最大的类别。For the K groups to be analyzed, for OAO, two types of classifiers are established between each pair of K groups, so there are a total of K(K-1)/2 parallel OAO classifiers. All K(K-1)/2 models use the majority voting principle to predict and assign test objects. For OAA, OAA constructed K two-category (one-against-(K-1)) models. For the j (j=1,2,3...,K) model, the jth group is assigned a value of +1, and the other (K-1) groups are recorded as -1. The prediction object will obtain K predicted response vectors from the above K OAA models and group them into corresponding categories according to the maximum response value. For EPHAH, the k-class problem is decomposed into C(k,k/2) parallel two-class classifiers, and the maximum winning voting strategy is used to assign new samples to the class with the largest voting power.

偏最小二乘判别分析是一种根据多个变量的测量值来判断研究对象分类的多元统计分析方法。其原理是对不同样本的特征进行训练，然后生成训练集，最后测试训练集的可信性。在PLSDA中，构造了一个由+1(a类)和-1(b类)组成的虚响应向量来表示特征矩阵中的每个对象。预测响应值的阈值可以设置为0,即对象的预测响应值大于或小于0被分配到类a或b。PLSDA一个重要的关键参数是重要潜在变量(lv)的确定。在这项研究中,采用蒙特卡洛交叉验证(MCCV)评价模型的复杂性，并通过最小化的错误率来计算MCCV(ERMCCV)。Partial least squares discriminant analysis is a multivariate statistical analysis method that determines the classification of research objects based on the measured values of multiple variables. The principle is to train the characteristics of different samples, then generate a training set, and finally test the credibility of the training set. In PLSDA, a virtual response vector consisting of +1 (category a) and -1 (category b) is constructed to represent each object in the feature matrix. The threshold of the predicted response value can be set to 0, that is, the object whose predicted response value is greater than or less than 0 is assigned to class a or b. An important key parameter of PLSDA is the determination of important latent variables (lv). In this study, Monte Carlo cross validation (MCCV) was used to evaluate model complexity, and MCCV was calculated by minimizing the error rate (ERMCCV).

本实施例涉及对样品近红外特征光谱数据处理的数据处理算法选择。即使同一份样品在同一台近红外检测仪多次扫描茶叶样品，结果也是不同的，这些不同的近红外数据的主体数据是相同的，但是会出现一些的异常吸收峰波。这些算法的主要目的是去噪，具体而言就是消除异常吸收峰波。而这些异常吸收峰波会在化学计量学处理后放大影响，所以需要利用数据处理算法消除。同时，值得注意的是，样品近红外特征光谱数据差异性较小，所以需要引入平滑处理后的茶叶样品近红外特征光谱图与未平滑处理的茶叶样品近红外特征光谱图相似性这一参量，保证后期提取的特征光谱信号不被误处理。简单地说，相似性参量的引入能够提高特征光谱信号的识别准确性。This embodiment involves the selection of data processing algorithms for processing near-infrared characteristic spectral data of samples. Even if the same sample is scanned multiple times with the same near-infrared detector, the results will be different. The main data of these different near-infrared data are the same, but some abnormal absorption peaks will appear. The main purpose of these algorithms is denoising, specifically the elimination of abnormal absorption peaks. These abnormal absorption peaks will amplify the impact after chemometric processing, so they need to be eliminated using data processing algorithms. At the same time, it is worth noting that the difference in the sample near-infrared characteristic spectral data is small, so it is necessary to introduce the parameter similarity between the smoothed tea sample near-infrared characteristic spectrum and the unsmoothed tea sample near-infrared characteristic spectrum. This ensures that the characteristic spectral signals extracted later are not misprocessed. Simply put, the introduction of similarity parameters can improve the identification accuracy of characteristic spectral signals.

值得注意的是，去噪处理后的近红外特征光谱图仍然会出现异常样本，这个异常并非异常吸收峰波，而是整个图像出现异常分布，所以需要将异常样本剔除，其方法是基于近红外光谱的异常样本剔除方法，该方法主要包括两个步骤：一是对数据集进行主成分分析，确定主成分的数量。二是测定预测残差的均值和方差。It is worth noting that abnormal samples will still appear in the near-infrared characteristic spectrum after denoising. This abnormality is not an abnormal absorption peak, but an abnormal distribution in the entire image, so the abnormal samples need to be eliminated. The method is based on near-infrared Spectral abnormal sample elimination method, this method mainly includes two steps: First, perform principal component analysis on the data set to determine the number of principal components. The second is to measure the mean and variance of the prediction residuals.

在本申请的一实施例中，S300步骤，包括：In an embodiment of this application, step S300 includes:

S310，将茶叶样品近红外特征光谱图的波长标准化。S310. Standardize the wavelength of the near-infrared characteristic spectrum of the tea sample.

S320，计算茶叶样品近红外特征光谱图吸收度和波长的协方差矩阵以识别相关性。S320, calculate the covariance matrix of the near-infrared characteristic spectrum absorbance and wavelength of the tea sample to identify correlations.

S330，计算茶叶样品近红外特征光谱图的协方差矩阵的特征向量和特征值以识别目标波长。S330, calculate the eigenvectors and eigenvalues of the covariance matrix of the near-infrared characteristic spectrum of the tea sample to identify the target wavelength.

具体的，常用的特征波段选择算法包括间隔偏最小二乘法(Interval partialleast squares,I-PLS)和联合区间偏最小二乘法(synergy interval partial leastsquare，SI-PLS)。Specifically, commonly used feature band selection algorithms include interval partial least squares (I-PLS) and synergy interval partial least squares (SI-PLS).

I-PLS的原理是将整个光谱划分为更小的均匀区域，这就是波长标准化的作用。然后对每个子区域使用相同数量的潜在变量建立单独的PLS回归模型。I-PLS首先将整个光谱等分成k个区间，然后分别对每个区间进行偏最小二乘回归得到k个回归模型。采用交叉验证的方法分别k个模型的计算均方根误差(Root Mean Square Error of CrossValidation,RMSECV)，比较各个模型的RMSECV值，RMSECV最小的区间对应的回归模型即为最优模型。在确定最优回归模型后，基于最优回归模型计算茶叶样品近红外特征光谱图吸收度和波长的协方差矩阵的相关性。The principle of I-PLS is to divide the entire spectrum into smaller uniform areas, which is the role of wavelength normalization. Separate PLS regression models were then built for each subregion using the same number of latent variables. I-PLS first divides the entire spectrum into k intervals, and then performs partial least squares regression on each interval to obtain k regression models. The cross-validation method is used to calculate the Root Mean Square Error of CrossValidation (RMSECV) of k models respectively, and the RMSECV values of each model are compared. The regression model corresponding to the interval with the smallest RMSECV is the optimal model. After determining the optimal regression model, the correlation of the covariance matrix between the absorbance and wavelength of the tea sample's near-infrared characteristic spectrum was calculated based on the optimal regression model.

更具体的，SI-PLS是Norgaard等人在I-PLS的基础上提出的一种有效的特征波段选择算法，SI-PLS则是在I-PLS划分k个区间的基础上对子区间进行操作，SI-PLS在I-PLS划分的k个区间中随机选择j(2<＝j<＝k)个区间组成联合区间建立PLS模型,共建立C个PLS模型，最小RMSECV值对应的j个区间的组合即为最优区间。SI-PLS的计算量与k和j的值有很大关系，当k值一定时，随着j值的增加，其计算量将呈指数级增长，因此在SI-PLS的计算过程中j值不宜设置过大，一般j小于6。SI-PLS可进一步消除连续波段区间的冗余，其选出的特征波段有利于后续建立的分类模型和特征区间研究。More specifically, SI-PLS is an effective feature band selection algorithm proposed by Norgaard et al. based on I-PLS. SI-PLS operates on sub-intervals based on k intervals divided by I-PLS. , SI-PLS randomly selects j (2<=j<=k) intervals from the k intervals divided by I-PLS to form a joint interval to establish a PLS model. A total of C PLS models are established, and j intervals correspond to the minimum RMSECV value. The combination of is the optimal interval. The calculation amount of SI-PLS has a great relationship with the values of k and j. When the value of k is constant, as the value of j increases, the amount of calculation will increase exponentially. Therefore, in the calculation process of SI-PLS, the value of j It should not be set too large, generally j is less than 6. SI-PLS can further eliminate the redundancy of continuous band intervals, and the selected characteristic bands are beneficial to the subsequent establishment of classification models and characteristic interval research.

本实施例涉及样品近红外特征光谱图的确定。经过半监督算法拣选后的样品近红外特征光谱所存留的各个吸收峰在对应的波长被标准化。这些吸收峰和波长的协方差矩阵能够得出识别相关性。依托识别相关性计算过程中的协方差矩阵的特征向量和特征值能够提取出识别目标波长。目标波长对应有吸收峰，这能够量化茶叶样品的产地和时间的评估指标，提高检测的精确度。This embodiment involves the determination of the near-infrared characteristic spectrum of the sample. Each absorption peak remaining in the near-infrared characteristic spectrum of the sample selected by the semi-supervised algorithm is normalized at the corresponding wavelength. The covariance matrix of these absorption peaks and wavelengths enables identification correlations to be derived. The identification target wavelength can be extracted based on the eigenvectors and eigenvalues of the covariance matrix in the identification correlation calculation process. The target wavelength corresponds to an absorption peak, which can quantify the evaluation indicators of the origin and time of the tea sample and improve the accuracy of detection.

在本申请的一实施例中，S300步骤，还包括：In an embodiment of the present application, step S300 also includes:

S340，选取茶叶样品近红外特征光谱图目标波长。S340: Select the target wavelength of the near-infrared characteristic spectrum of the tea sample.

S350，将目标波长按吸收度由高到低进行排列。S350, arrange the target wavelengths from high to low absorbance.

S360，利用连续投影算法筛选出基于茶叶样品近红外特征光谱数据图中特征光谱信号的鉴别模型所需的特征光谱信号。S360, use the continuous projection algorithm to screen out the characteristic spectral signals required for the identification model based on the characteristic spectral signals in the near-infrared characteristic spectral data chart of the tea sample.

本实施例涉及样品近红外特征光谱图的确定。对于选中的茶叶样品近红外特征光谱图目标波长，按吸收度由高到低依次排列。将多份同一产地同一时间的茶叶样本利用连续投影算法筛选出基于茶叶样品近红外特征光谱数据图中特征光谱信号的鉴别模型所需的特征光谱信号。那么这些特征光谱信号就是该产地该时间段的标志数据。这些量化及物化的标志数据，能够提高茶叶的产地与时间因素耦合关系的准确性，同时也能够提高建立茶叶的产地与时间因素耦合关系的效率。This embodiment involves the determination of the near-infrared characteristic spectrum of the sample. For the target wavelengths of the selected near-infrared characteristic spectra of tea samples, they are arranged in order from high to low absorbance. Multiple tea samples from the same origin and at the same time are used to screen out the characteristic spectral signals required by the identification model based on the characteristic spectral signals in the near-infrared characteristic spectral data chart of the tea samples using a continuous projection algorithm. Then these characteristic spectral signals are the signature data of this production area and this time period. These quantitative and materialized landmark data can improve the accuracy of the coupling relationship between tea's origin and time factors, and can also improve the efficiency of establishing the coupling relationship between tea's origin and time factors.

在本申请的一实施例中，S500步骤，包括：In an embodiment of this application, step S500 includes:

S510，建立统计数据表。S510, create a statistical data table.

S520，将同一产地的不同时间段的茶叶样品近红外特征光谱图中的特征光谱信号及其对应的吸收度导入统计数据表。S520, import the characteristic spectral signals and their corresponding absorbances in the near-infrared characteristic spectra of tea samples from the same origin in different time periods into the statistical data table.

S530，分析各个时间段特征光谱信号的吸收度变化情况随时间的变化曲线。S530: Analyze the change curve of the absorbance change of the characteristic spectrum signal in each time period with time.

S540，并用偏最小二乘法进行特征光谱信号的吸收度变化情况随时间拟合出茶叶产地与时间因素耦合函数。S540, and use the partial least squares method to fit the absorbance changes of the characteristic spectral signals over time to fit the coupling function between the tea origin and the time factor.

本实施例涉及茶叶产地与时间因素耦合函数的拟合。偏最小二乘法是一种新型的多元统计数据分析方法，它于1983年由伍德(S.Wold)和阿巴诺(C.Albano)等人最早提出来的，在一个算法下，可以同时实现回归建模(多元线性回归)、数据结构简化(主成分分析)以及两组变量之间的相关性分析(典型相关分析)。这是多元统计数据分析中的一个飞跃。This embodiment involves fitting the coupling function of tea origin and time factors. The partial least squares method is a new type of multivariate statistical data analysis method. It was first proposed by S.Wold and C.Albano in 1983. It can be implemented simultaneously under one algorithm. Regression modeling (multiple linear regression), data structure simplification (principal component analysis), and correlation analysis between two sets of variables (canonical correlation analysis). This is a leap forward in multivariate statistical data analysis.

偏最小二乘法回归主要用于解决多对多的线性回归分析问题,尤其是变量之间存在多重相关性、变量多但样本容量小、异方差等问题时,使用最小二乘法回归比经典线性回归更有优势。偏最小二乘法回归的分析过程集中了主成分分析、典型相关分析、线性回归分析等多种方法的特点,所以对茶叶产地与时间因素耦合函数的分析与拟合会有更加深入,提供的信息更加丰富,获得的结果更加合理。Partial least squares regression is mainly used to solve many-to-many linear regression analysis problems, especially when there are multiple correlations between variables, many variables but small sample size, heteroskedasticity, etc., using least squares regression is better than classic linear regression. have more advantages. The analysis process of partial least squares regression combines the characteristics of principal component analysis, canonical correlation analysis, linear regression analysis and other methods. Therefore, the analysis and fitting of the coupling function of tea origin and time factors will be more in-depth and provide more information. Richer, the results obtained are more reasonable.

偏最小二乘回归的特点：Characteristics of partial least squares regression:

(1)能够在自变量存在严重多重相关性的条件下进行回归建模；(1) Ability to perform regression modeling under conditions where the independent variables have severe multiple correlations;

(2)允许在样本点个数少于变量个数的条件下进行回归建模；(2) Regression modeling is allowed under the condition that the number of sample points is less than the number of variables;

(3)偏最小二乘回归在最终模型中将包含原有的所有自变量；(3) Partial least squares regression will include all original independent variables in the final model;

(4)偏最小二乘回归模型更易于辨识系统信息与噪声(甚至一些非随机性的噪声)；(4) The partial least squares regression model is easier to identify system information and noise (even some non-random noise);

(5)在偏最小二乘回归模型中，每一个自变量的回归系数将更容易解释。(5) In the partial least squares regression model, the regression coefficient of each independent variable will be easier to interpret.

建模方法：Modeling method:

设有q个产地与时间的可能性输出量和p个特征光谱信号参量。为了研究产地与时间的可能性输出量与特征光谱信号参量的统计关系，观测了n个同一产地不同时间的茶叶样本的近红外光图谱，由此构成了特征光谱信号参量与产地与时间的可能性输出量的数据表X和Y。偏最小二乘回归分别在X与Y中提取出t和u，t和u应尽可能大地携带它们各自数据表中的变异信息，t和u的相关程度能够达到最大。在第一个成分被提取后，偏最小二乘回归分别实施X对t的回归以及Y对t的回归。如果回归方程已经达到满意的精度，则算法终止；否则，将利用X被t解释后的残余信息以及Y被t解释后的残余信息进行第二轮的成分提取。如此往复，直到能达到一个较满意的精度为止。若最终对X共提取了多个成分，偏最小二乘回归将通过施行yk对X的这些成分的回归，然后再表达成yk关于原自变量的回归方程。There are q possible output quantities of origin and time and p characteristic spectral signal parameters. In order to study the statistical relationship between the possible output of origin and time and the characteristic spectral signal parameters, the near-infrared spectra of n tea samples from the same origin at different times were observed, thus forming the possible relationship between the characteristic spectral signal parameters and the origin and time. Data tables for sexual output X and Y. Partial least squares regression extracts t and u from X and Y respectively. t and u should carry the variation information in their respective data tables as much as possible, and the correlation degree of t and u can reach the maximum. After the first component is extracted, partial least squares regression implements the regression of X on t and the regression of Y on t respectively. If the regression equation has reached satisfactory accuracy, the algorithm terminates; otherwise, the second round of component extraction will be performed using the residual information after X is explained by t and the residual information after Y is explained by t. Repeat this until a more satisfactory accuracy is achieved. If multiple components are finally extracted from X, the partial least squares regression will perform the regression of yk on these components of

值得一提的是，本实施例只对同一产地不同时间段的样本拟合出茶叶产地与时间因素耦合函数。实际上重复上述步骤，可以得出每一个产地的不同时间段茶叶的产地与时间因素耦合函数。这些不同的茶叶产地与时间因素耦合函数都将被纳入鉴别模型。It is worth mentioning that this embodiment only fits the tea origin and time factor coupling function to samples from the same origin in different time periods. In fact, by repeating the above steps, the coupling function of origin and time factors of tea in different time periods of each origin can be obtained. These different coupling functions of tea origin and time factors will be incorporated into the identification model.

在本申请的一实施例中，S600步骤，包括：In an embodiment of this application, step S600 includes:

S610，接收茶叶样品近红外特征光谱数据。S610, receive near-infrared characteristic spectrum data of tea samples.

S620，利用全卷积神经网络深度学习算法及半监督算法选择该茶叶样品近红外特征光谱数据的最优数据处理算法进行去噪处理和融合处理，获得茶叶样品近红外特征光谱图。S620, use the fully convolutional neural network deep learning algorithm and the semi-supervised algorithm to select the optimal data processing algorithm for the near-infrared characteristic spectral data of the tea sample for denoising and fusion processing, and obtain the near-infrared characteristic spectrum of the tea sample.

S630，利用主成分分析算法和连续投影算法得出该茶叶样品的特征光谱信号。S630, use the principal component analysis algorithm and the continuous projection algorithm to obtain the characteristic spectral signal of the tea sample.

S640，将该茶叶样品的特征光谱信号导入茶叶产地与时间因素耦合函数，并得出原产地可信度数值比例表。S640, import the characteristic spectral signal of the tea sample into the tea origin and time factor coupling function, and obtain a numerical ratio table of origin credibility.

S650，重复将该茶叶样品的特征光谱信号导入茶叶产地与时间因素耦合函数，并得出原产地可信度数值比例表步骤，直至基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型中的茶叶产地与时间因素耦合函数遍历完成。S650, repeatedly import the characteristic spectral signal of the tea sample into the tea origin and time factor coupling function, and obtain the numerical ratio table of origin credibility, until the identification model is based on the characteristic spectral signal in the near-infrared characteristic spectral data image of the tea sample. The tea origin and time factor coupling function traversal is completed.

本实施例涉及基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型的构建。利用近红外特征图谱、茶叶产地与时间因素耦合函数和化学计量学在构建基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型中，茶叶样本可以采用近红外光谱进行鉴别，并且近红外对同时分析茶叶样本特征光谱信号具有良好的分析能力。采集茶叶样本的近红外光谱数据，利用robust-鲁棒主成分分析(RPCA)剔除异常样本，对剔除异常样本后的光谱，采用SPXY划分，进行Savitzky-Golay(SG)平滑、一阶导数、二阶导数、SNV、MSC等单一或联合运算预处理，其结果显示不同处理数据算法对不同的茶叶样本预处理有不同的效果。采用SI-PLS算法对茶叶的近红外光谱进行特征光谱区间选择。通过所选区间的相对频率与谱波数的关系图，可以确定茶叶样本的近红外特征光谱区间，且被选特征光谱区间与茶叶样本产地及时间息息相关。结合EPHAH-PLSDA计量学方法能大幅度提高多分类模型性能。针对茶叶样本的多类地理溯源和时间关系建立的EPHAH-PLSDA模型对样本集以外的样品具有更好的吻合能力。This embodiment involves the construction of an identification model based on the characteristic spectral signals in the near-infrared characteristic spectral data image of tea samples. Using near-infrared characteristic spectra, coupling functions of tea origin and time factors, and chemometrics to construct an identification model based on characteristic spectral signals in near-infrared characteristic spectral data images of tea samples, tea samples can be identified using near-infrared spectroscopy, and near-infrared It has good analytical capabilities for simultaneous analysis of characteristic spectral signals of tea samples. Collect near-infrared spectral data of tea samples, use robust-robust principal component analysis (RPCA) to eliminate abnormal samples, use SPXY to divide the spectrum after eliminating abnormal samples, and perform Savitzky-Golay (SG) smoothing, first-order derivative, second-order derivative Single or joint operation preprocessing such as order derivative, SNV, MSC, etc. The results show that different data processing algorithms have different effects on different tea sample preprocessing. The SI-PLS algorithm was used to select the characteristic spectral interval of the near-infrared spectrum of tea. Through the relationship between the relative frequency and spectral wave number of the selected interval, the near-infrared characteristic spectral interval of the tea sample can be determined, and the selected characteristic spectral interval is closely related to the place and time of production of the tea sample. Combining the EPHAH-PLSDA econometric method can greatly improve the performance of multi-classification models. The EPHAH-PLSDA model established for multiple types of geographical traceability and time relationships of tea samples has better matching capabilities for samples outside the sample set.

在本申请的一实施例中，S700步骤，包括：In an embodiment of this application, step S700 includes:

S710，确定融合梯度。S710, determine the fusion gradient.

S720，分别选取多个不同更新梯度样本数据加入每一个原产地茶叶样本池。S720: Select a plurality of different updated gradient sample data to add to each origin tea sample pool.

S730，更新原产地茶叶样本池，确定最优融合梯度。S730, update the origin tea sample pool and determine the optimal fusion gradient.

本实施例涉及鉴别模型的训练。采集750个茶叶样本集的近红外光谱数据，利用robust-鲁棒主成分分析(RPCA)剔除4个异常茶叶样本集，对剔除异常样本后的光谱，采用SPXY划分，将整个数据分为520个茶叶样本集的训练集和226个茶叶样本集的测试集，将整个数据反复进行70％的训练和30％的测试的。然后进行Savitzky-Golay(SG)平滑、一阶导数、二阶导数、SNV、MSC等单一或联合运算预处理。采用SI-PLS算法对茶叶样本集的近红外光谱进行特征光谱区间选择。通过所选区间的相对频率与谱波数的关系图，可以确定EPHAH能够有效确定茶叶样本集近红外特征光谱区间，且被选特征光谱区间与茶叶样本集息息相关。利用联合区间偏最小二乘法(SIPLS)挑选出的特征波长，采用DUPLEX划分法按训练集和验证集，测试集＝6：2：2，对选出的特征波长点建立EPHAH-PLSDA模型，半对半(AnExhaustive and Parallel Half-Against-Half,EPHAH)分解方法。EPHAH-PLSDA统计茶叶样本集的产地及时间的吻合度，其分类准确率分别为96.00％。其结果表明，联合SNV+SG+2D预处理和SIPLS特征光谱区间选择方法，结合EPHAH-PLSDA计量学方法能大幅度提高多分类模型性能。针对茶叶样本集的多类地理溯源与时间建立的EPHAH-PLSDA模型对测试集以外的茶叶样本集具有更好的预测能力。该方法提高了分类性能，为茶叶样本集的地理来源-时间评价提供了一种可行的方法。This embodiment relates to the training of a discrimination model. Collect near-infrared spectrum data of 750 tea sample sets, use robust-robust principal component analysis (RPCA) to eliminate 4 abnormal tea sample sets, and use SPXY to divide the spectra after eliminating abnormal samples, and divide the entire data into 520 The training set of the tea sample set and the test set of 226 tea sample sets were used to repeatedly perform 70% training and 30% testing on the entire data. Then, single or joint operation preprocessing such as Savitzky-Golay (SG) smoothing, first-order derivative, second-order derivative, SNV, and MSC is performed. The SI-PLS algorithm was used to select the characteristic spectral interval of the near-infrared spectrum of the tea sample set. Through the relationship between the relative frequency of the selected interval and the spectral wave number, it can be determined that EPHAH can effectively determine the near-infrared characteristic spectral interval of the tea sample set, and the selected characteristic spectral interval is closely related to the tea sample set. Use the joint interval partial least squares method (SIPLS) to select the characteristic wavelengths, use the DUPLEX partition method to divide the training set and the verification set, the test set = 6:2:2, and establish an EPHAH-PLSDA model for the selected characteristic wavelength points, half An Exhaustive and Parallel Half-Against-Half (EPHAH) decomposition method. EPHAH-PLSDA counts the coincidence of origin and time of tea sample sets, and its classification accuracy is 96.00% respectively. The results show that combining SNV+SG+2D preprocessing and SIPLS feature spectrum interval selection method, combined with EPHAH-PLSDA metrology method can greatly improve the performance of multi-classification models. The EPHAH-PLSDA model established for multi-category geographical traceability and time of tea sample sets has better prediction ability for tea sample sets other than the test set. This method improves classification performance and provides a feasible method for geographical origin-time evaluation of tea sample sets.

本申请还提供一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的系统。This application also provides a system for tea traceability methods based on the coupling relationship between tea origin and time factors.

如图2所示，在本申请的一实施例中，一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的系统，包括上位机100和近红外检测仪200。As shown in Figure 2, in one embodiment of the present application, a system for tracing the origin of tea based on the coupling relationship between tea origin and time factors includes a host computer 100 and a near-infrared detector 200.

上位机100用于执行基于茶叶产地与时间因素耦合关系的茶叶溯源方法。The host computer 100 is used to execute the tea traceability method based on the coupling relationship between tea origin and time factors.

近红外检测仪200用于扫描茶叶样品，获取茶叶样品近红外特征光谱数据，且与所述上位机通信连接。The near-infrared detector 200 is used to scan the tea sample, obtain the near-infrared characteristic spectrum data of the tea sample, and communicate with the host computer.

本实施例涉及一种基于茶叶产地与时间因素耦合关系的茶叶溯源方法的系统。本申请利用近红外检测仪200获取茶叶样本的近红外光谱数据，上位机100执行基于茶叶产地与时间因素耦合关系的茶叶溯源方法，侧重的解决时间跨度影响下基于茶叶样品近红外特征光谱数据图像中特征光谱信号的鉴别模型迁移识别能力较弱的问题，在上位机100执行基于茶叶产地与时间因素耦合关系的茶叶溯源方法中，以卷积神经网络为代表的深度学习算法对模型迁移能力和鉴别能力有提高作用，分析比较多种近红外光谱数据的数据处理算法及其网络结构参数的优化问题，从而解决原产地识别模型的长时间跨度适应性问题。以采摘时间为代表性时间因素，解决了采摘时间因素与原产地因素间的耦合关系。This embodiment relates to a system of tea traceability method based on the coupling relationship between tea origin and time factors. This application uses a near-infrared detector 200 to obtain near-infrared spectral data of tea samples. The host computer 100 executes a tea traceability method based on the coupling relationship between tea origin and time factors, focusing on solving the problem of near-infrared characteristic spectral data images of tea samples under the influence of time span. The problem of weak migration recognition ability of the identification model of medium characteristic spectral signals. When the host computer 100 executes the tea traceability method based on the coupling relationship between tea origin and time factors, the deep learning algorithm represented by the convolutional neural network has a certain impact on the model migration ability and The identification ability can be improved by analyzing and comparing the data processing algorithms of various near-infrared spectral data and the optimization of network structure parameters, thereby solving the long-term adaptability problem of the origin identification model. Taking picking time as the representative time factor, the coupling relationship between picking time factors and origin factors is solved.

以上所述实施例的各技术特征可以进行任意的组合，各方法步骤也并不做执行顺序的限制，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be combined in any way, and the execution order of each method step is not limited. In order to make the description concise, all possible combinations of each technical feature in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, it should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对本申请专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请的保护范围应以所附权利要求为准。The above-described embodiments only express several implementation modes of the present application, and their descriptions are relatively specific and detailed, but should not be construed as limiting the patent scope of the present application. It should be noted that, for those of ordinary skill in the art, several modifications and improvements can be made without departing from the concept of the present application, and these all fall within the protection scope of the present application. Therefore, the scope of protection of this application should be determined by the appended claims.