CN118041471B

Movatterモバイル変換

Info

Publication number: CN118041471B
Application number: CN202410432093.9A
Authority: CN
Inventors: 胡珍珍; 文昭林; 张敏; 陈超; 邓永红; 魏培阳
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2024-04-11
Filing date: 2024-04-11
Publication date: 2024-06-11
Anticipated expiration: 2044-04-11
Also published as: CN118041471A

Abstract

Translated fromChinese

本发明涉及信号传输技术领域，尤其涉及基于机器学习逻辑回归算法的频谱感知方法及系统，该方法包括以下步骤：S1，采集认知节点所接收到的信号，并从信号中解析出信号数据；S2，对解析出的信号数据进行预处理，所述预处理依次包括特征提取、无量纲化、特征降维；S3，预先训练完成的逻辑回归模型对预处理后的信号数据进行处理，输出得到频谱是否空闲的概率值，概率值大于阈值则判别为频谱空闲，小于等于阈值则判别为频谱不空闲。该系统包括数据采集模块、预处理模块、频谱感知模块。采用本发明进行频谱感知，准确度高，且计算量小，感知效率高。

The present invention relates to the field of signal transmission technology, and in particular to a spectrum sensing method and system based on a machine learning logistic regression algorithm, the method comprising the following steps: S1, collecting signals received by cognitive nodes and parsing signal data from the signals; S2, preprocessing the parsed signal data, the preprocessing sequentially comprising feature extraction, dimensionless conversion, and feature dimension reduction; S3, a pre-trained logistic regression model processes the preprocessed signal data, and outputs a probability value of whether the spectrum is idle, if the probability value is greater than a threshold, the spectrum is judged to be idle, and if it is less than or equal to the threshold, the spectrum is judged to be not idle. The system comprises a data acquisition module, a preprocessing module, and a spectrum sensing module. The present invention is used for spectrum sensing, with high accuracy, small amount of calculation, and high sensing efficiency.

Description

Translated fromChinese

基于机器学习逻辑回归算法的频谱感知方法及系统Spectrum sensing method and system based on machine learning logistic regression algorithm

技术领域Technical Field

本发明涉及信号传输技术领域，具体涉及一种基于机器学习逻辑回归算法的频谱感知方法及系统。The present invention relates to the field of signal transmission technology, and in particular to a spectrum sensing method and system based on a machine learning logistic regression algorithm.

背景技术Background technique

频谱感知是认知无线电的关键技术之一，其主要功能是寻找频谱空穴。认知无线电通过实时感知周围的电磁频谱环境，动态和智能地选择传输的频段和方式，以最大化满足用户的需求。频谱感知技术主要涉及物理层和链路层，其中物理层主要关注各种具体的本地检测算法，而链路层主要关注用户间的协作以及对感知机制的控制与优化。Spectrum sensing is one of the key technologies of cognitive radio, and its main function is to find spectrum holes. Cognitive radio perceives the surrounding electromagnetic spectrum environment in real time, and dynamically and intelligently selects the frequency band and mode of transmission to maximize the needs of users. Spectrum sensing technology mainly involves the physical layer and the link layer. The physical layer focuses on various specific local detection algorithms, while the link layer focuses on the collaboration between users and the control and optimization of the sensing mechanism.

主要的感知算法包括能量检测算法、匹配滤波器检测算法、循环平稳特征检测算法、协方差矩阵检测算法以及协作感知法。此外，为了克服本地检测的弊端，进一步提高检测性能，协作感知得到了广泛而深入的研究。协作频谱感知的具体方法目前主要包括以下几种：集中式协作频谱感知、分布式协作频谱感知、以及基于能量检测的协作频谱感知。在集中式协作频谱感知这种类型中，所有的次用户（或称为辅助用户）将他们的感知结果发送到一个中心节点，然后由中心节点进行决策。这种类型的优点是决策过程简单，但缺点是需要大量的通信资源来传输感知结果，而且中心节点可能成为瓶颈。分布式协作频谱感知是次用户之间直接交换感知结果，并通过一种分布式算法进行决策。这种类型的优点是可以节省通信资源，但缺点是决策过程可能比较复杂。基于能量检测的协作频谱感知大多数是对硬/软判决结果进行加权合并来达到感知或决策结果的融合，加权合并方法有大数判决、等增益合并 (EGC)、最大比合并 (MRC)和用户选择等。The main sensing algorithms include energy detection algorithm, matched filter detection algorithm, cyclostationary feature detection algorithm, covariance matrix detection algorithm and cooperative sensing method. In addition, in order to overcome the drawbacks of local detection and further improve the detection performance, cooperative sensing has been widely and deeply studied. The specific methods of cooperative spectrum sensing currently include the following: centralized cooperative spectrum sensing, distributed cooperative spectrum sensing, and cooperative spectrum sensing based on energy detection. In this type of centralized cooperative spectrum sensing, all secondary users (or auxiliary users) send their sensing results to a central node, and then the central node makes a decision. The advantage of this type is that the decision-making process is simple, but the disadvantage is that a large amount of communication resources are required to transmit the sensing results, and the central node may become a bottleneck. Distributed cooperative spectrum sensing is that the secondary users directly exchange the sensing results and make decisions through a distributed algorithm. The advantage of this type is that it can save communication resources, but the disadvantage is that the decision-making process may be complicated. Most of the cooperative spectrum sensing based on energy detection is to achieve the fusion of sensing or decision results by weighted merging of hard/soft decision results. The weighted merging methods include large number decision, equal gain combining (EGC), maximum ratio combining (MRC) and user selection.

这些方法都有各自的优点和缺点，可以根据具体的应用场景进行相应选择。但这些方法也存在一个共同的缺点，即计算量相对较大，继而导致处理效率低。These methods have their own advantages and disadvantages, and can be selected according to specific application scenarios. However, these methods also have a common disadvantage, that is, the amount of calculation is relatively large, which leads to low processing efficiency.

发明内容Summary of the invention

本发明的目的在于提供一种基于机器学习逻辑回归算法的频谱感知方法及系统，以提高频谱感知的处理效率。The object of the present invention is to provide a spectrum sensing method and system based on a machine learning logistic regression algorithm to improve the processing efficiency of spectrum sensing.

为了实现上述目的，本发明提供了以下技术方案：In order to achieve the above object, the present invention provides the following technical solutions:

一种基于机器学习逻辑回归算法的频谱感知方法，包括以下步骤：A spectrum sensing method based on machine learning logistic regression algorithm comprises the following steps:

S1，采集认知节点所接收到的信号，并从信号中解析出信号数据；S1, collects the signals received by the cognitive nodes and parses the signal data from the signals;

S2，对解析出的信号数据进行预处理，所述预处理依次包括特征提取、无量纲化、特征降维；S2, preprocessing the parsed signal data, wherein the preprocessing includes feature extraction, dimensionless conversion, and feature dimension reduction in sequence;

S3，预先训练完成的逻辑回归模型对预处理后的信号数据进行处理，输出得到频谱是否空闲的概率值，概率值大于阈值则判别为频谱空闲，小于等于阈值则判别为频谱不空闲。S3, the pre-trained logistic regression model processes the pre-processed signal data and outputs a probability value of whether the spectrum is idle. If the probability value is greater than a threshold, the spectrum is judged to be idle; if the probability value is less than or equal to the threshold, the spectrum is judged to be not idle.

上述方案中，采用逻辑回归模型对信号数据进行处理，算法简单，分类时计算量较小，因此处理速度快，效率高，而且存储资源低。In the above scheme, a logistic regression model is used to process the signal data. The algorithm is simple and the amount of calculation during classification is small. Therefore, the processing speed is fast, the efficiency is high, and the storage resources are low.

所述S1中，解析出的信号数据包括应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延；所述逻辑回归模型包括线性回归函数和激活函数，线性回归函数为输入，激活函数为输出，其中，线性回归函数为：In S1, the parsed signal data includes application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay; the logistic regression model includes a linear regression function and an activation function, the linear regression function is input, and the activation function is output, wherein the linear regression function is:

； ;

激活函数为：；The activation function is: ;

式中，h(x)为目标值，x1、x2、x3、x4、x5、x6分别表示应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延，w1、w2、w3、w4、w5、w6分别为应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延的权重，b为调节系数，θ^Tx为信号数据线性回归方程矩阵表达形式。Where h(x) is the target value, x1, x2, x3, x4, x5, and x6 represent application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay, respectively; w1, w2, w3, w4, w5, and w6 are the weights of application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay, respectively; b is the adjustment coefficient; and θ^T x is the matrix expression of the linear regression equation for signal data.

上述方案中特别选用了信号的需求带宽与分配带宽、信号延迟时间作为特征，可以适用于5G、6G信号的频谱感知。The above scheme specifically selects the required bandwidth and allocated bandwidth of the signal and the signal delay time as features, which can be applied to spectrum sensing of 5G and 6G signals.

在所述逻辑回归模型的训练过程中，先将激活函数的输出值按照如下公式映射到（-1，1）内：During the training of the logistic regression model, the output value of the activation function is first mapped to (-1, 1) according to the following formula:

； ;

然后采用如下函数进行损失计算：Then use the following function to calculate the loss:

； ;

其中，表示映射后的激活函数的输出值，g为激活函数的输出值，g_max与g_min分别为激活函数输出值的最大值与最小值，y表示正确的类别，取值为1或-1，β为所述阈值在（-1，1）之间的映射值，如果/>，则损失为/>；如果/>，则损失为0。in, represents the output value of the activation function after mapping, g is the output value of the activation function, g_max and g_min are the maximum and minimum values of the activation function output value respectively, y represents the correct category, the value is 1 or -1, β is the mapping value of the threshold between (-1, 1), if/> , then the loss is/> ; if /> , the loss is 0.

上述方案中，通过先将预测值映射到（-1，1）之间，然后通过优化后的损失函数计算损失值，可以使得那些难以被分类最终造成虚警或漏报的信号被关注，继而可以提高逻辑回归模型的预测能力，输出更准确的结果。In the above scheme, by first mapping the predicted value to (-1, 1) and then calculating the loss value through the optimized loss function, those signals that are difficult to classify and ultimately cause false alarms or missed reports can be paid attention to, thereby improving the prediction ability of the logistic regression model and outputting more accurate results.

所述S2中，采用字典特征提取法对信号数据进行特征提取处理。In S2, a dictionary feature extraction method is used to perform feature extraction processing on the signal data.

信号数据的应用类型、信号长度、信号的需求带宽与分配带宽、信号的能量等数据类型皆为类别型特征，通过字典特征提取后，类别型的非数值型数据转化为数值型数据，有利于逻辑回归算法能更好的处理这些数据。The data types of signal data, such as application type, signal length, required bandwidth and allocated bandwidth, and energy of the signal, are all categorical features. After dictionary feature extraction, the categorical non-numeric data is converted into numerical data, which is conducive to the logistic regression algorithm to better process these data.

所述S2中，所述无量纲化包括归一化和z-score标准化。In S2, the dimensionless transformation includes normalization and z-score standardization.

所述S2中，采用主成分分析法对z-score标准化后数据进行特征降维处理。In S2, the principal component analysis method is used to perform feature dimension reduction processing on the z-score standardized data.

一种基于机器学习逻辑回归算法的频谱感知系统，包括：A spectrum sensing system based on machine learning logistic regression algorithm, comprising:

数据采集模块，用于采集认知节点所接收到的信号，并从信号中解析出信号数据；A data acquisition module is used to collect signals received by the cognitive node and parse signal data from the signals;

预处理模块，用于对解析出的信号数据进行预处理，所述预处理依次包括特征提取、无量纲化、特征降维；A preprocessing module is used to preprocess the parsed signal data, wherein the preprocessing includes feature extraction, dimensionless conversion, and feature dimension reduction in sequence;

频谱感知模块，利用预先训练完成的逻辑回归模型对预处理后的信号数据进行处理，输出得到频谱是否空闲的概率值，概率值大于阈值则判别为频谱空闲，小于等于阈值则判别为频谱不空闲。The spectrum sensing module uses a pre-trained logistic regression model to process the pre-processed signal data and outputs a probability value of whether the spectrum is idle. If the probability value is greater than a threshold, the spectrum is judged to be idle; if it is less than or equal to the threshold, the spectrum is judged to be not idle.

数据采集模块解析出的信号数据包括应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延；所述逻辑回归模型包括线性回归函数和激活函数，线性回归函数为输入，激活函数为输出，其中，线性回归函数为：The signal data analyzed by the data acquisition module includes application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay; the logistic regression model includes a linear regression function and an activation function, the linear regression function is input, and the activation function is output, wherein the linear regression function is:

； ;

激活函数为：；The activation function is: ;

还包括模型训练模块，用于训练得到所述逻辑回归模型，在所述逻辑回归模型的训练过程中，先将激活函数的输出值按照如下公式映射到（-1，1）内：It also includes a model training module for training the logistic regression model. During the training of the logistic regression model, the output value of the activation function is first mapped to (-1, 1) according to the following formula:

； ;

预处理模块采用字典特征提取法对信号数据进行特征提取处理。The preprocessing module uses dictionary feature extraction method to perform feature extraction on signal data.

与现有技术相比，本发明具有以下有益效果：Compared with the prior art, the present invention has the following beneficial effects:

考虑到频谱感知技术发展所面临的低信噪比时系统精确度提升问题与恶意攻击时系统性能的稳定，相较于其他算法在频谱感知领域的应用，逻辑回归算法实现简单，分类时计算量非常小，速度很快，存储资源低，并且其输出结果并不止一个分类结果，可以输出概率值，可便利的观测样本概率分数，以便对数据进行更合理的优化。在已有标签数据的基础上进行训练，能使得模型的预测结果更加精确，且频谱感知结果实则为一二分类问题，与逻辑回归算法的作用高度契合。Considering the problem of improving system accuracy at low signal-to-noise ratio and stabilizing system performance during malicious attacks faced by spectrum sensing technology, compared with other algorithms in the field of spectrum sensing, the logistic regression algorithm is simple to implement, has very small amount of calculation during classification, is very fast, has low storage resources, and its output results are not just one classification result, but can output probability values, which can conveniently observe sample probability scores, so as to optimize the data more reasonably. Training based on existing labeled data can make the model's prediction results more accurate, and the spectrum sensing results are actually a one-two classification problem, which is highly consistent with the role of the logistic regression algorithm.

基于字典特征提取与Hinge损失函数的逻辑回归算法进行频谱感知，逻辑回归模型性能好，感知结果准确度高。Spectrum sensing is performed based on the logistic regression algorithm based on dictionary feature extraction and Hinge loss function. The logistic regression model has good performance and high accuracy of perception results.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为实施例中提供的基于机器学习逻辑回归算法的频谱感知方法的流程图。FIG1 is a flow chart of a spectrum sensing method based on a machine learning logistic regression algorithm provided in an embodiment.

图2为基于测试集对逻辑回归模型进行测试的ROC曲线图。FIG2 is a ROC curve diagram of the logistic regression model tested based on the test set.

图3为基于测试集对逻辑回归模型进行测试的评估结果图。FIG3 is a diagram showing the evaluation results of the logistic regression model based on the test set.

图4为实施例中提供的基于机器学习逻辑回归算法的频谱感知系统的组成框图。FIG4 is a block diagram of a spectrum sensing system based on a machine learning logistic regression algorithm provided in an embodiment.

具体实施方式Detailed ways

下面将结合本发明实施例中附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本发明一部分实施例，而不是全部的实施例。通常在此处附图中描述和示出的本发明实施例的组件可以以各种不同的配置来布置和设计。因此，以下对在附图中提供的本发明的实施例的详细描述并非旨在限制要求保护的本发明的范围，而是仅仅表示本发明的选定实施例。基于本发明的实施例，本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例，都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. The components of the embodiments of the present invention generally described and shown in the drawings here can be arranged and designed in various different configurations. Therefore, the following detailed description of the embodiments of the present invention provided in the drawings is not intended to limit the scope of the claimed invention, but merely represents the selected embodiments of the present invention. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative work belong to the scope of protection of the present invention.

请参考图1，本实施例提供的基于机器学习逻辑回归算法的频谱感知方法，包括以下步骤：Referring to FIG. 1 , the spectrum sensing method based on the machine learning logistic regression algorithm provided in this embodiment includes the following steps:

S2，对解析出的信号数据进行预处理，所述预处理包括特征提取、无量纲化、特征降维；S2, preprocessing the parsed signal data, wherein the preprocessing includes feature extraction, dimensionless conversion, and feature dimension reduction;

本实施上述方法可用于对5G或6G频谱信号进行感知，S1中，解析出的信号数据包括应用类型、信号强度、信号的需求带宽与分配带宽、信号能量以及信号延迟时间。The above method of this embodiment can be used to sense 5G or 6G spectrum signals. In S1, the parsed signal data includes application type, signal strength, required bandwidth and allocated bandwidth of the signal, signal energy and signal delay time.

5G、6G提供了更高的数据速率，更低的延迟，以及更高的连接密度。5G网络的带宽也取决于其使用的频谱范围。预计6G将提供更高性能的无线连接和极致的用户体验，峰值速率可以达到Tbps，用户体验速率可以达到10~100Gbps。6G将会使用从毫米波到THz再到可见光的全频段频谱。与4G对比下，5G和6G的带宽与延时数据将更具有代表性。因此本实施例方法特别选用了信号的需求带宽与分配带宽、信号延迟时间作为特征值。5G and 6G provide higher data rates, lower latency, and higher connection density. The bandwidth of a 5G network also depends on the spectrum range it uses. It is expected that 6G will provide higher-performance wireless connections and an ultimate user experience, with a peak rate of up to Tbps and a user experience rate of 10 to 100 Gbps. 6G will use a full-band spectrum from millimeter waves to THz to visible light. Compared with 4G, the bandwidth and delay data of 5G and 6G will be more representative. Therefore, the method of this embodiment specifically selects the required bandwidth and allocated bandwidth of the signal and the signal delay time as characteristic values.

本实施例中，上述S2中，针对于5G或6G信号，采用字典特征提取法对信号数据进行特征提取处理，将非数值型的信号数据生成为稀疏矩阵或者one-hot编码矩阵，使其转换为数值型数据。信号数据的应用类型、信号长度、信号的需求带宽与分配带宽、信号能量、信号延迟时间的数据类型皆为类别型特征，本实施例中通过字典特征提取后，类别型的非数值型数据转化为数值型数据，能够有利于逻辑回归模型更好地处理这些数据，继而有利于逻辑回归模型输出更可靠的分类结果。In this embodiment, in the above S2, for 5G or 6G signals, a dictionary feature extraction method is used to perform feature extraction processing on the signal data, and the non-numeric signal data is generated as a sparse matrix or a one-hot encoding matrix to convert it into numerical data. The application type of the signal data, signal length, required bandwidth and allocated bandwidth of the signal, signal energy, and data types of signal delay time are all categorical features. In this embodiment, after dictionary feature extraction, the categorical non-numeric data is converted into numerical data, which can help the logistic regression model to better process these data, and then help the logistic regression model output more reliable classification results.

无量纲化过程包括归一化和标准化。本实施例中，由于采用的5G信号数据，其不同特征之间有不同的量纲与量纲单位，且其数据数量级相差巨大。其带宽根据3GPP协议定义，5G的单载波最大需要支持100MHz，毫米波甚至每载波达到了400MHz。协议还规定5G可以支持最多16个载波聚合，这意味着5G的需求带宽最大可以达到1.6GHz到6.4GHz。而其信号强度通常在-50 dBm（信号很好）到-120 dBm（信号非常差或者无信号区域）之间。时延更是达到毫秒级。至于6G，目前还在研发阶段，其信号强度以及需求带宽和分配带宽尚未做出明确规定，但其信号强度会比5G更小，其需求带宽和分配带宽会比5G更大，时延能达到微秒级。由此可看出，5G、6G信号的特征数量数值的数量级相差非常之大。这会严重影响到数据分析的结果。因此，通过无量纲化可以消除这种量纲影响，使得各个特征处于同一数量级，从而提高数据的可比性。The dimensionless process includes normalization and standardization. In this embodiment, due to the use of 5G signal data, different features have different dimensions and dimensional units, and the order of magnitude of the data varies greatly. According to the 3GPP protocol definition, the bandwidth of 5G single carrier needs to support a maximum of 100MHz, and millimeter wave even reaches 400MHz per carrier. The protocol also stipulates that 5G can support up to 16 carrier aggregations, which means that the maximum required bandwidth of 5G can reach 1.6GHz to 6.4GHz. And its signal strength is usually between -50 dBm (very good signal) and -120 dBm (very poor signal or no signal area). The delay is even at the millisecond level. As for 6G, it is still in the research and development stage, and its signal strength, required bandwidth and allocated bandwidth have not yet been clearly defined, but its signal strength will be smaller than 5G, its required bandwidth and allocated bandwidth will be larger than 5G, and the delay can reach microseconds. It can be seen that the order of magnitude of the characteristic quantity values of 5G and 6G signals is very different. This will seriously affect the results of data analysis. Therefore, this dimension effect can be eliminated by dimensionlessization, so that each feature is at the same order of magnitude, thereby improving the comparability of the data.

归一化是极差变换法，也称为min-max方法，通过线性变换将特征提取后的数值型数据映射到 [0,1]之间。Normalization is a range transformation method, also known as the min-max method, which maps the numerical data after feature extraction to between [0,1] through linear transformation.

标准化方法采用z-score，公式如下：The standardization method uses z-score, and the formula is as follows:

其中，mean为特征的平均值，σ为标准差，x为归一化后的数据，X’为标准化后的数据。标准化后将数据转换到均值为0、标准差为1的范围内。Among them, mean is the average value of the feature, σ is the standard deviation, x is the normalized data, and X’ is the standardized data. After standardization, the data is converted to a range with a mean of 0 and a standard deviation of 1.

本实施例中，采用主成分分析法对标准化后的数据进行特征降维处理。具体地，本实施例采用的5G信号数据量大，包含大量冗余信息与噪声。通过主成分分析法可以有效的降低数据的维度，简化数据的复杂性，消除数据中的噪声和冗余信息，从而提高数据的精度和可靠性。主成分分析主要包括以下步骤：In this embodiment, the principal component analysis method is used to perform feature dimension reduction processing on the standardized data. Specifically, the 5G signal data used in this embodiment is large in volume and contains a large amount of redundant information and noise. The principal component analysis method can effectively reduce the dimension of the data, simplify the complexity of the data, eliminate the noise and redundant information in the data, and thus improve the accuracy and reliability of the data. The principal component analysis mainly includes the following steps:

1）去平均值(即去中心化)，即每一位特征减去各自的平均值。1) De-averaging (i.e. decentralization), that is, subtracting the mean value of each feature from its own.

2）计算协方差矩阵，在本发明里，n为特征值种类数量，X为特征值数据构成的矩阵。2) Calculate the covariance matrix In the present invention, n is the number of eigenvalue types, and X is the matrix composed of eigenvalue data.

3）用特征值分解方法求协方差矩阵的特征值与特征向量。3) Use the eigenvalue decomposition method to find the covariance matrix The eigenvalues and eigenvectors of .

4）对特征值从大到小排序，选择其中最大的k个特征值，然后将其对应的k个特征向量分别作为行向量组成特征向量矩阵P。4) Sort the eigenvalues from large to small, select the largest k eigenvalues, and then use their corresponding k eigenvectors as row vectors to form the eigenvector matrix P.

5）将数据(构成X矩阵的数据)转换到k个特征向量构建的新空间中，即Y=PX。5) Convert the data (the data that constitutes the X matrix) into a new space constructed by k eigenvectors, that is, Y=PX.

在进行特征降维后，将标准化后高维度的数据保留下最重要的一些特征，去除噪声和不重要的特征。特征降维后的数据即可以输入逻辑回归模型中。After feature dimensionality reduction, the most important features of the standardized high-dimensional data are retained, and noise and unimportant features are removed. The data after feature dimensionality reduction can be input into the logistic regression model.

上述S3中应用的逻辑回归模型是经过大量数据训练而得到的。采集到大量信号后采用S1和S2相同的处理方法进行处理，即先对采集到的信号解析出信号数据，然后再对信号数据进行预处理，处理后作为训练样本用于回归模型训练。训练样本中需要标注出特征值和目标值，信号数据的应用类型、信号强度、信号的需求带宽与分配带宽、信号的能量与信号时延作为特征值，频谱是否空闲作为目标值。The logistic regression model used in S3 above is obtained through training with a large amount of data. After a large amount of signals are collected, they are processed using the same processing method as S1 and S2, that is, the collected signals are first parsed to obtain signal data, and then the signal data is preprocessed, and the processed data is used as training samples for regression model training. The training samples need to be marked with characteristic values and target values. The application type of signal data, signal strength, required bandwidth and allocated bandwidth of the signal, signal energy and signal delay are used as characteristic values, and whether the spectrum is idle is used as the target value.

逻辑回归模型是先对输入数据进行线性预测，将结果输入激活函数，对得出的结果进行判别。本实施例中逻辑回归的线性方程为：The logistic regression model first performs linear prediction on the input data, inputs the result into the activation function, and discriminates the obtained result. The linear equation of logistic regression in this embodiment is:

式中，h(x)为目标值，x1、x2、x3、x4、x5、x6分别表示应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延，w1、w2、w3、w4、w5、w6分别为应用类型、信号强度、需求带宽、分配带宽、信号能量、信号时延的权重，b为调节系数。Where h(x) is the target value, x1, x2, x3, x4, x5, and x6 represent application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay, respectively; w1, w2, w3, w4, w5, and w6 are the weights of application type, signal strength, required bandwidth, allocated bandwidth, signal energy, and signal delay, respectively; and b is the adjustment coefficient.

线性回归的损失函数为：The loss function for linear regression is:

y_i为第i个训练样本的真实值，即该样本所对应的目标值，h_w(x_i)为第i个训练样本特征值组合预测的函数值，m表示样本的数量，i为样本的编号，i=1,2,…m。最终算出的J(x)表示为损失值。_yi is the true value of the i-th training sample, that is, the target value corresponding to the sample,_hw (_xi ) is the function value predicted by the combination of feature values of the i-th training sample, m is the number of samples, i is the sample number, i=1,2,…m. The final calculated J(x) is expressed as the loss value.

本实施例中，采用梯度下降法对初始的逻辑回归模型进行优化，求解出逻辑回归模型中的权重W，使得损失最小。梯度下降的公式为：In this embodiment, the gradient descent method is used to optimize the initial logistic regression model to solve the weight W in the logistic regression model so as to minimize the loss. The formula of gradient descent is:

其中，w_i为线性方程权重系数，J(x)为损失函数，α为学习速率。应用到信号处理中，考虑到信号数据信息量大，所以设置学习速率α为0.7~1，以达到一个较快的迭代速率，具体数值可按实际调节。Among them, w_i is the weight coefficient of the linear equation, J(x) is the loss function, and α is the learning rate. When applied to signal processing, considering the large amount of signal data information, the learning rate α is set to 0.7~1 to achieve a faster iteration rate. The specific value can be adjusted according to the actual situation.

将优化后的线性回归结果输入激活函数：Input the optimized linear regression results into the activation function:

其中，g(θ^Tx)为将回归的结果输入激活函数后生成的概率值，θ^Tx为信号数据线性回归方程矩阵表达形式，θ^T为信号数据线性方程权重矩阵。Among them, g(θ^T x) is the probability value generated after the regression result is input into the activation function, θ^T x is the matrix expression of the linear regression equation of signal data, and θ^T is the weight matrix of the linear equation of signal data.

将回归的结果输入激活函数中，输出结果为[0,1]区间中的一个概率值。应用到信号处理中，为达到更高的预测率，减小虚警与漏报概率，将阈值设为0.8，概率值大于阈值则判别为频谱空闲，小于阈值判别为频谱不空闲。常规的逻辑回归算法的损失称为对数似然损失，公式如下：The regression result is input into the activation function, and the output result is a probability value in the interval [0,1]. When applied to signal processing, in order to achieve a higher prediction rate and reduce the probability of false alarms and missed alarms, the threshold is set to 0.8. If the probability value is greater than the threshold, it is judged as idle spectrum, and if it is less than the threshold, it is judged as not idle spectrum. The loss of the conventional logistic regression algorithm is called log-likelihood loss, and the formula is as follows:

其中，y_i为每个样本所对应的真实值，即每个样本的目标值，g(θ^Tx)为将回归的结果输入激活函数后生成的概率值。Among them,_yi is the true value corresponding to each sample, that is, the target value of each sample, and g(θ^T x) is the probability value generated after the regression result is input into the activation function.

y为真实值的判断，当y=1时，预测值g(θ^Tx)越接近1损失越小，反之y=0时亦然。使用梯度下降算法优化，更新逻辑回归模型中的各个权重，减少损失函数的值，提升原本属于1类别的概率，降低原本属于0类别的概率。y is the judgment of the true value. When y=1, the closer the predicted value g(θ^T x) is to 1, the smaller the loss is. The same is true when y=0. Use the gradient descent algorithm to optimize and update the weights in the logistic regression model to reduce the value of the loss function, increase the probability of belonging to the 1 category, and reduce the probability of belonging to the 0 category.

在本发明中，创新性地采用Hinge Loss函数代替对数似然损失。理论上，HingeLoss和逻辑损失函数都可以用于二分类问题的优化。Hinge Loss函数和逻辑损失函数的主要区别在于它们对误分类样本的处理方式。Hinge Loss函数只关注那些难以分类或者分类错误的样本，对于被正确分类且置信度足够高的样本其损失为0。然而在频谱感知问题上，那些难以被分类最终造成虚警或漏报的信号更值得被关注。因此，在数据集中有很多噪声，或者更需关注分类边界的样本时（即不易被判断容易造成虚警或漏报的信号数据），使用Hinge Loss函数会有更好的效果。In the present invention, the Hinge Loss function is innovatively used instead of the log-likelihood loss. In theory, both the Hinge Loss and the logistic loss functions can be used to optimize binary classification problems. The main difference between the Hinge Loss function and the logistic loss function lies in how they handle misclassified samples. The Hinge Loss function only focuses on samples that are difficult to classify or misclassified, and its loss is 0 for samples that are correctly classified and have a high enough confidence level. However, in the problem of spectrum sensing, those signals that are difficult to classify and ultimately cause false alarms or missed reports are more worthy of attention. Therefore, when there is a lot of noise in the data set, or when more attention needs to be paid to samples at the classification boundary (i.e., signal data that is difficult to judge and easily causes false alarms or missed reports), using the Hinge Loss function will have a better effect.

因此，本发明中采用Hinge Loss函数代替对数似然损失，且对Hinge Loss函数进行了优化。优化后的Hinge Loss函数的数学表达式为：Therefore, the present invention adopts the Hinge Loss function to replace the log-likelihood loss, and optimizes the Hinge Loss function. The mathematical expression of the optimized Hinge Loss function is:

其中，表示激活函数的输出值（映射后在（-1，1）之间），通常都是软结果（指输出不是为-1或1，其结果为-1到1之间任意值）。y表示正确的类别，通常以-1和1表示。在本案例中，则以y=1表示频谱空闲，y=-1表示频谱繁忙。β为激活函数的阈值在（-1，1）之间的映射值，如果/>，则损失为/>；如果/>，则损失为0。在传统Hinge Loss函数中，如果输入值为概率值，其损失只能无限趋近于0而无法为0。优化后的Hinge Loss函数通过添加一个根据阈值设定的系数/>，能将大于预测阈值且预测正确的样本损失直接设为0，使其本身能够更关注于那些不容易被判断的数据。in, Represents the output value of the activation function (after mapping, it is between (-1, 1)), which is usually a soft result (the output is not -1 or 1, and the result is any value between -1 and 1). y represents the correct category, usually represented by -1 and 1. In this case, y=1 indicates that the spectrum is idle, and y=-1 indicates that the spectrum is busy. β is the mapping value of the threshold of the activation function between (-1, 1). If/> , then the loss is/> ; if /> , the loss is 0. In the traditional Hinge Loss function, if the input value is a probability value, its loss can only approach 0 infinitely but cannot be 0. The optimized Hinge Loss function adds a coefficient set according to the threshold /> , which can directly set the sample loss that is greater than the prediction threshold and predicted correctly to 0, so that it can focus more on data that are not easy to judge.

在使用此损失函数之前需要对激活函数的输出结果进行进一步处理。根据逻辑回归算法最终输入激活函数的结果恰好为0到1的概率值，其预测结果符合软结果要求。但其值区间为（0，1），而y表示正确的类别为-1和1，使用此概率值无法对y=-1的损失做出正确的评估，因此需要将其值域映射到（-1，1）。Before using this loss function, the output of the activation function needs to be further processed. According to the logistic regression algorithm, the final input activation function result is a probability value of 0 to 1, and its prediction result meets the soft result requirements. However, its value range is (0, 1), and y represents the correct category of -1 and 1. Using this probability value, it is impossible to make a correct assessment of the loss of y=-1, so its value range needs to be mapped to (-1, 1).

对激活函数输出结果进行归一化处理，其公式为：The output of the activation function is normalized, and the formula is:

根据该公式，则能将结果映射到（-1，1）之间。g为激活函数的输出值，g_max与g_min分别为激活函数输出值的最大值与最小值。此时，根据预设的激活函数的阈值为0.8，则映射后的β值为0.6，即系数为1.66。处理后的结果便可用于优化后的Hinge Loss函数进行损失评估。最终模型会根据损失结果进行迭代优化，以达到损失最小值，此时模型训练完成。According to this formula, the result can be mapped to (-1, 1). g is the output value of the activation function, g_max and g_min are the maximum and minimum output values of the activation function respectively. At this time, according to the preset threshold of the activation function of 0.8, the mapped β value is 0.6, that is, the coefficient is 1.66. The processed result can be used for loss evaluation of the optimized Hinge Loss function. The final model will be iteratively optimized according to the loss result to achieve the minimum loss value, and the model training is completed at this time.

训练完成后对逻辑回归模型进行评估，以便再对模型中的参数进行优化。在逻辑回归中，通过计算其精确率、召回率、F1-score以及ROC曲线和AUC指标进行模型评估。精确率为预测结果为正例样本中真实为正例的比例，召回率为真实为正例的样本中预测结果为正例的比例，F1-score公式为：After training, the logistic regression model is evaluated to optimize the parameters in the model. In logistic regression, the model is evaluated by calculating its precision, recall, F1-score, ROC curve and AUC indicators. Precision is the proportion of samples with predicted positive results to samples with true positive results, recall is the proportion of samples with predicted positive results to samples with true positive results, and the F1-score formula is:

Precision表示精确率，Recall表示召回率。F1-score数学模型本质为精确率与召回率的积除以精确率与召回率的和，即精确率与召回率越高模型越稳定的。ROC 曲线是一种展示模型在不同阈值下真正例率（又称召回率或灵敏度）与假正例率之间关系的图形，横轴为假正例率（FPR），表示实际为负例但被错误地预测为正例的样本比例，纵轴为真正例率（TPR），表示实际为正例并被正确地预测为正例的样本比例。随着模型阈值的变化，真正例率和假正例率会发生变化，ROC 曲线展示了这种变化过程。AUC是ROC曲线下的面积，代表了模型对正例和负例的区分能力，AUC的取值范围在0.5到1之间，越接近1表示模型性能越好。Precision stands for precision, and Recall stands for recall. The essence of the F1-score mathematical model is the product of precision and recall divided by the sum of precision and recall, that is, the higher the precision and recall, the more stable the model. The ROC curve is a graph that shows the relationship between the true positive rate (also known as recall or sensitivity) and the false positive rate of the model at different thresholds. The horizontal axis is the false positive rate (FPR), which indicates the proportion of samples that are actually negative but are mistakenly predicted as positive, and the vertical axis is the true positive rate (TPR), which indicates the proportion of samples that are actually positive and are correctly predicted as positive. As the model threshold changes, the true positive rate and false positive rate will change, and the ROC curve shows this change process. AUC is the area under the ROC curve, which represents the model's ability to distinguish between positive and negative examples. The value range of AUC is between 0.5 and 1. The closer to 1, the better the model performance.

以400个信号的预处理后的信号数据作为数据集，其中300个信号的信号数据作为训练集，100个信号的信号数据作为测试集，先用训练集进行训练得到逻辑回归模型，再用测试集对训练好的逻辑回归模型进行测试。测试结果如图2和图3所示，可以看出，通过逻辑回归算法进行预测,其中对于空闲状态的精确率与召回率皆达到了0.93以上，F1-score系数也能达到0.92以上，经过计算AUC指标也达到了0.95，较为接近1，表示该模型预测性能较好。也就是说，本实施例所述频谱感知方法具有较高的可靠性。The preprocessed signal data of 400 signals are used as the data set, of which the signal data of 300 signals are used as the training set and the signal data of 100 signals are used as the test set. The training set is used to train the logistic regression model, and then the trained logistic regression model is tested with the test set. The test results are shown in Figures 2 and 3. It can be seen that the prediction is performed by the logistic regression algorithm, where the precision and recall rate of the idle state are both above 0.93, and the F1-score coefficient can also reach above 0.92. After calculation, the AUC index also reaches 0.95, which is relatively close to 1, indicating that the model has good prediction performance. In other words, the spectrum sensing method described in this embodiment has high reliability.

请参阅图4，基于相同的发明构思，本实施例中同时提供了一种基于机器学习逻辑回归算法的频谱感知系统，包括：Referring to FIG. 4 , based on the same inventive concept, this embodiment also provides a spectrum sensing system based on a machine learning logistic regression algorithm, including:

模型训练模块，基于预处理后的样本数据训练得到逻辑回归模型；Model training module, which trains the logistic regression model based on the preprocessed sample data;

对于各个功能模块的具体处理方法或流程，可以参见前述方法流程中的相关描述，例如所述预处理模块中，采用字典特征提取法对信号数据进行特征提取处理；又例如模型训练模块中，在所述逻辑回归模型的训练过程中，先将激活函数的输出值映射到（-1，1）内，然后利用优化后的Hinge Loss函数进行损失计算。此处为节省篇幅，不再赘述。For the specific processing methods or processes of each functional module, please refer to the relevant description in the aforementioned method flow. For example, in the preprocessing module, the dictionary feature extraction method is used to extract features from the signal data; for another example, in the model training module, during the training of the logistic regression model, the output value of the activation function is first mapped to (-1, 1), and then the optimized Hinge Loss function is used to calculate the loss. To save space, no further description is given here.

以上所述，仅为本发明的具体实施方式，但本发明的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本发明揭露的技术范围内，可轻易想到变化或替换，都应涵盖在本发明的保护范围之内。The above description is only a specific implementation mode of the present invention, but the protection scope of the present invention is not limited thereto. Any technician familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed by the present invention, which should be covered by the protection scope of the present invention.