CN114529890A

Movatterモバイル変換

Info

Publication number: CN114529890A
Application number: CN202210174065.2A
Authority: CN
Inventors: 潘蓬; 谭昶; 贾若然; 郑爱华; 张友国; 吕军; 胡少云
Original assignee: iFlytek Co Ltd; Anhui University; Iflytek Information Technology Co Ltd
Current assignee: iFlytek Co Ltd; Anhui University; Iflytek Information Technology Co Ltd
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-24
Anticipated expiration: 2042-02-24
Also published as: CN114529890B

Abstract

The invention provides a state detection method, a state detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining an image to be detected; based on the state detection model, self-adaptively positioning a state related region in the image to be detected through spatial transformation, and detecting the personnel state of the image to be detected through the state related region; the state detection model is obtained based on the sample image and the training of the personnel state class labels of the sample image. The method, the device, the electronic equipment and the storage medium provided by the invention can adaptively position the region relevant to the state in the image to be detected by carrying out space transformation on the characteristics of the image to be detected, and then carry out personnel state detection on the image to be detected in the region relevant to the state, thereby realizing that the region relevant to the state in the image to be detected is obtained by taking the region relevant to the state as a detection target, reducing the problem of wrong detection results of subsequent state classes caused by detection of a fixed region, and improving the accuracy of state class detection.

Description

Translated fromChinese

状态检测方法、装置、电子设备及存储介质State detection method, device, electronic device and storage medium

技术领域technical field

本发明涉及机器视觉技术领域，尤其涉及一种状态检测方法、装置、电子设备及存储介质。The present invention relates to the technical field of machine vision, and in particular, to a state detection method, device, electronic device and storage medium.

背景技术Background technique

目前在汽车辅助驾驶领域，驾驶员状态检测是辅助驾驶系统的重要组成部分，旨在车内能够做到向驾驶员察觉到危险，减少危险因素。驾驶员状态检测主要是检测驾驶员出现分神的情况，例如：抽烟、打电话或者疲劳驾驶等，当检测到这些情况时辅助驾驶系统会进行语音或者警示灯告警。At present, in the field of assisted driving, driver status detection is an important part of the assisted driving system, which aims to detect danger to the driver in the car and reduce risk factors. Driver status detection is mainly to detect the driver's distraction, such as smoking, talking on the phone, or fatigued driving.

现有的驾驶员状态检测主要是通过从图像中固定的ROI区域(感兴趣区域)，例如：眼睛、嘴巴和手部区域等，再通过ROI区域的局部特征进行人员状态检测。但以与状态无关的关键点检测机制进行ROI区域检测，会出现ROI区域识别出错导致人员状态类别检测结果错误，还可能出现ROI区域识别正常但由于ROI区域特征较少依旧会导致人员状态类别检测结果错误。The existing driver state detection is mainly through the fixed ROI area (region of interest) in the image, such as: eyes, mouth and hand areas, etc., and then through the local features of the ROI area to detect the human state. However, if the ROI area is detected by the key point detection mechanism that is not related to the state, there will be errors in the identification of the ROI area, resulting in an error in the detection result of the personnel status category. It may also occur that the ROI area recognition is normal, but due to the lack of ROI area features, it will still lead to personnel status category detection. The result is wrong.

发明内容SUMMARY OF THE INVENTION

本发明提供一种状态检测方法、装置、电子设备及存储介质，用以解决现有技术中通过ROI进行人员状态类别检测容易导致检测结果错误的缺陷。The present invention provides a state detection method, device, electronic device and storage medium, which are used to solve the defect that in the prior art, the detection of personnel state categories through ROI easily leads to wrong detection results.

本发明提供一种状态检测方法，包括：The present invention provides a state detection method, comprising:

确定待检测图像；Determine the image to be detected;

基于状态检测模型，通过空间变换自适应定位所述待检测图像中的状态相关区域，并通过所述状态相关区域对所述待检测图像进行人员状态检测；Based on the state detection model, adaptively locate the state-related region in the to-be-detected image through spatial transformation, and perform personnel state detection on the to-be-detected image through the state-related region;

所述状态检测模型是基于样本图像和所述样本图像的人员状态类别标签训练得到的。The state detection model is obtained by training based on the sample images and the person state category labels of the sample images.

根据本发明提供的一种状态检测方法，所述基于状态检测模型，通过空间变换自适应定位所述待检测图像中的状态相关区域，并通过所述状态相关区域对所述待检测图像进行人员状态检测，包括：According to a state detection method provided by the present invention, based on the state detection model, the state-related region in the image to be detected is adaptively located through spatial transformation, and the image to be detected is subjected to personnel inspection through the state-related region. Status detection, including:

基于所述状态检测模型中的状态定位网络，通过空间变换自适应定位所述待检测图像中的状态相关区域，得到状态定位特征，并通过所述状态定位特征和所述待检测图像的卷积特征确定所述状态相关区域的图像特征，所述状态定位特征用于指示所述状态相关区域在所述待检测图像中的位置；Based on the state localization network in the state detection model, the state-related regions in the to-be-detected image are adaptively located through spatial transformation to obtain a state-location feature, and the convolution of the state-location feature and the to-be-detected image is performed. The feature determines the image feature of the state-related region, and the state-location feature is used to indicate the position of the state-related region in the to-be-detected image;

基于所述状态检测模型中的分类网络，应用所述状态相关区域的图像特征对所述待检测图像进行人员状态检测。Based on the classification network in the state detection model, the image feature of the state-related region is applied to perform personnel state detection on the to-be-detected image.

根据本发明提供的一种状态检测方法，所述基于所述状态检测模型中的状态定位网络，通过空间变换自适应定位所述待检测图像中的状态相关区域，得到状态定位特征，并通过所述状态定位特征和所述待检测图像的卷积特征确定所述状态相关区域的图像特征，包括：According to a state detection method provided by the present invention, based on the state location network in the state detection model, the state-related region in the to-be-detected image is adaptively located through spatial transformation to obtain the state location feature, and through the The state positioning feature and the convolution feature of the image to be detected determine the image feature of the state-related region, including:

基于所述状态定位网络中的多层卷积网络，对所述待检测图像进行特征提取，得到所述多层卷积网络中每层卷积输出的卷积特征；Based on the multi-layer convolution network in the state positioning network, feature extraction is performed on the to-be-detected image to obtain the convolution feature output by each layer of the multi-layer convolution network;

基于所述状态定位网络中的空间变换网络，应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，直至得到最后一层的空间变换特征，并将所述最后一层的空间变换特征确定为所述状态定位特征，通过所述状态定位特征和最后一层卷积输出的卷积特征确定所述状态相关区域的图像特征。Based on the spatial transformation network in the state positioning network, the convolution feature output by the convolution of the current layer and the spatial transformation feature obtained from the spatial transformation of the previous layer are applied to perform spatial transformation to obtain the spatial transformation feature of the spatial transformation of the current layer, until the The spatial transformation feature of the last layer, and the spatial transformation feature of the last layer is determined as the state positioning feature, and the state-related region is determined by the state positioning feature and the convolution feature output by the last layer of convolution image features.

根据本发明提供的一种状态检测方法，所述应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，包括：According to a state detection method provided by the present invention, applying the convolution feature output by the convolution of the current layer and the spatial transformation feature obtained by the spatial transformation of the previous layer to perform the spatial transformation to obtain the spatial transformation feature of the spatial transformation of the current layer, comprising: :

将当前层卷积输出的卷积特征与前一层空间变换所得的状态定位特征进行特征融合，得到当前层卷积对应的融合特征，并对当前层卷积对应的融合特征进行空间变换，得到当前层空间变换的状态定位特征。Perform feature fusion on the convolution feature output by the current layer convolution and the state positioning feature obtained by the spatial transformation of the previous layer to obtain the fusion feature corresponding to the current layer convolution, and perform spatial transformation on the fusion feature corresponding to the current layer convolution to obtain The state localization feature of the current layer space transformation.

根据本发明提供的一种状态检测方法，所述基于所述状态检测模型中的分类网络，应用所述状态相关区域的图像特征对所述待检测图像进行人员状态检测，包括：According to a state detection method provided by the present invention, the use of image features of the state-related region to perform personnel state detection on the to-be-detected image based on the classification network in the state detection model includes:

基于所述分类网络中的光照感知网络，对所述状态相关区域的图像特征进行光照强度均衡，得到所述状态相关区域的均衡特征；Based on the illumination perception network in the classification network, light intensity equalization is performed on the image features of the state-related regions to obtain equalization features of the state-related regions;

基于所述分类网络中的状态分类网络，应用所述状态相关区域的均衡特征对所述待检测图像进行人员状态检测。Based on the state classification network in the classification network, the balanced feature of the state-related region is applied to perform personnel state detection on the to-be-detected image.

根据本发明提供的一种状态检测方法，所述基于所述分类网络中的光照感知网络，对所述状态相关区域的图像特征进行光照强度均衡，得到所述状态相关区域的均衡特征，包括：According to a state detection method provided by the present invention, the illumination intensity equalization is performed on the image features of the state-related area based on the illumination perception network in the classification network, and the equalization features of the state-related area are obtained, including:

基于所述光照感知网络中的强光感知网络分支和弱光感知网络分支，分别对所述状态相关区域的图像特征进行光照特征提取，得到所述状态相关区域的强光特征和所述状态相关区域的弱光特征；Based on the strong light sensing network branch and the weak light sensing network branch in the illumination sensing network, respectively perform illumination feature extraction on the image features of the state-related region, and obtain the strong-light feature of the state-related region and the state-related region. Low light characteristics of the area;

基于所述光照感知网络中权重融合分支，对所述状态相关区域的图像特征进行预测，得到光照强度权重值；并基于所述光照强度权重值，对所述强光感知特征和所述弱光感知特征进行加权，得到所述状态相关区域的均衡特征。Based on the weight fusion branch in the light perception network, predict the image features of the state-related area to obtain a light intensity weight value; and based on the light intensity weight value, analyze the strong light perception feature and the weak light The perceptual features are weighted to obtain balanced features of the state-related regions.

根据本发明提供的一种状态检测方法，所述状态检测模型基于如下步骤训练得到：According to a state detection method provided by the present invention, the state detection model is obtained by training based on the following steps:

确定初始检测模型；所述初始检测模型包括初始状态定位网络、初始光线感知网络和初始状态分类网络；Determine an initial detection model; the initial detection model includes an initial state positioning network, an initial light perception network and an initial state classification network;

基于所述样本图像以及所述样本图像的人员状态类别标签对初始检测模型进行训练，得到所述状态定位网络和所述光线感知网络；training an initial detection model based on the sample image and the person state category label of the sample image, to obtain the state localization network and the light perception network;

确定中间检测模型；所述中间检测模型包括所述状态定位网络和所述光线感知网络，以及所述初始状态分类网络；determining an intermediate detection model; the intermediate detection model includes the state positioning network and the light perception network, and the initial state classification network;

固定所述状态定位网络和所述光线感知网络的参数，基于所述样本图像以及所述样本图像的人员状态类别标签对所述中间检测模型进行训练，得到所述状态分类网络。The parameters of the state localization network and the light perception network are fixed, and the intermediate detection model is trained based on the sample image and the person state category label of the sample image to obtain the state classification network.

根据本发明提供的一种状态检测方法，所述固定所述状态定位网络和所述光线感知网络的参数，基于所述样本图像以及所述样本图像的人员状态类别标签对所述中间检测模型进行训练，得到所述状态分类网络，包括：According to a state detection method provided by the present invention, the parameters of the state positioning network and the light perception network are fixed, and the intermediate detection model is performed based on the sample image and the person state category label of the sample image. Training to obtain the state classification network, including:

基于各人员状态类别下的样本图像的样本数据量，确定各人员状态类别的权重；Determine the weight of each personnel status category based on the sample data volume of the sample images under each personnel status category;

基于所述中间检测模型基于所述样本图像输出的状态分类结果、所述样本图像的人员状态类别标签和各人员状态类别的权重进行损失计算，并基于所述损失计算结果更新所述初始分类网络的参数，直至所述损失计算结果收敛，得到所述状态分类网络。Based on the intermediate detection model, a loss calculation is performed based on the state classification result output by the sample image, the person state category label of the sample image, and the weight of each person state category, and the initial classification network is updated based on the loss calculation result. parameters until the loss calculation result converges, and the state classification network is obtained.

本发明还提供一种状态检测装置，包括：The present invention also provides a state detection device, comprising:

确定模块，用于确定待检测图像；a determination module, used to determine the image to be detected;

检测模块，用于基于状态检测模型，通过空间变换自适应定位所述待检测图像中的状态相关区域，并基于所述状态相关区域对所述待检测图像进行人员状态检测；a detection module, configured to adaptively locate a state-related region in the to-be-detected image through spatial transformation based on a state-detection model, and perform personnel state detection on the to-be-detected image based on the state-related region;

本发明还提供一种电子设备，包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序，所述处理器执行所述程序时实现如上述任一种所述状态检测方法的步骤。The present invention also provides an electronic device, including a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements any of the above state detection methods when the processor executes the program A step of.

本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如上述任一种所述状态检测方法的步骤。The present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of any of the above state detection methods.

本发明还提供一种计算机程序产品，包括计算机程序，所述计算机程序被处理器执行时实现如上述任一种所述状态检测方法的步骤。The present invention also provides a computer program product, comprising a computer program, when the computer program is executed by a processor, the steps of any one of the above state detection methods are implemented.

本发明提供的状态检测方法、装置、电子设备及存储介质，通过对待检测图像特征进行空间变换，在待检测图像中自适应定位到与状态相关的区域，再通过对状态相关区域对待检测图像进行人员状态检测，实现了以状态相关区域为检测目标，得到在待检测图像中与状态相关的区域，减少了因固定区域检测导致的后续人员状态类别检测结果错误的问题，提高了人员状态类别检测的准确率。The state detection method, device, electronic device and storage medium provided by the present invention adaptively locate the state-related area in the to-be-detected image by spatially transforming the characteristics of the image to be detected, and then perform the detection on the state-related area of the to-be-detected image. Person status detection realizes the detection target of the status-related area, and obtains the status-related area in the image to be detected, which reduces the problem of incorrect detection results of subsequent personnel status categories caused by the detection of fixed areas, and improves the detection of personnel status categories. 's accuracy.

附图说明Description of drawings

为了更清楚地说明本发明或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to explain the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort.

图1是本发明提供的状态检测方法的流程示意图之一；Fig. 1 is one of the schematic flow charts of the state detection method provided by the present invention;

图2是本发明提供的状态检测模型检测状态的流程示意图；2 is a schematic flowchart of a state detection model detection state provided by the present invention;

图3是本发明提供的状态定位特征获取方法的流程示意图；3 is a schematic flowchart of a method for obtaining a state positioning feature provided by the present invention;

图4是本发明提供的状态检测方法的流程示意图之二；Fig. 4 is the second schematic flow chart of the state detection method provided by the present invention;

图5是本发明提供的均衡特征获取方法的流程示意图；5 is a schematic flowchart of a method for obtaining an equalization feature provided by the present invention;

图6是本发明提供的状态检测模型训练方法的流程示意图；6 is a schematic flowchart of a state detection model training method provided by the present invention;

图7是本发明提供的状态分类网络的训练方法的流程示意图；7 is a schematic flowchart of a training method for a state classification network provided by the present invention;

图8是本发明提供的状态检测模型的网络框架图；8 is a network frame diagram of a state detection model provided by the present invention;

图9是本发明提供的STN网络结构图；Fig. 9 is the STN network structure diagram provided by the present invention;

图10是本发明提供的光照感知网络的网络框架图；10 is a network frame diagram of a light sensing network provided by the present invention;

图11是本发明提供的状态检测装置的结构示意图；11 is a schematic structural diagram of a state detection device provided by the present invention;

图12是本发明提供的电子设备的结构示意图。FIG. 12 is a schematic structural diagram of an electronic device provided by the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合本发明中的附图，对本发明中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the objectives, technical solutions and advantages of the present invention clearer, the technical solutions in the present invention will be clearly and completely described below with reference to the accompanying drawings. Obviously, the described embodiments are part of the embodiments of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

目前，驾驶员状态检测主要是通过人脸检测或者指定区域检测，然后对检测到ROI区域进行驾驶员状态检测，例如：对待检测图像的人脸区域进行眼部和嘴部ROI区域的检测，或者是对手部ROI区域的检测。但这种状态检测方式并不是直接以驾驶员状态作为检测目标，会因为ROI区域的检测出错导致状态检测结果的错误，并且即使ROI区域的检测正确，如果ROI区域特征较少依旧会导致状态检测结果有较高的错误率。At present, the driver state detection is mainly through face detection or designated area detection, and then the driver state detection is performed on the detected ROI area, for example, the eye and mouth ROI area detection is performed on the face area of the image to be detected, or It is the detection of the ROI area of the hand. However, this state detection method does not directly take the driver state as the detection target, and the state detection result will be wrong due to the detection error of the ROI area. Even if the detection of the ROI area is correct, if the ROI area has fewer features, it will still lead to state detection The result has a high error rate.

因此，如何直接以状态作为检测目标以提高人员状态类别检测结果的准确率是本领域亟待解决的技术问题。Therefore, how to directly use the state as the detection target to improve the accuracy of the detection result of the person state category is a technical problem to be solved urgently in the art.

针对以上技术问题，本发明实施例提供了一种状态检测方法。图1是本发明提供的状态检测方法的流程示意图之一。如图1所示，该方法可以应用于驾驶员状态检测场景，以下实施例均以驾驶员状态检测场景进行说明陈述，此外该方法还可以应用于类似的人员状态检测等相似场景，例如：学生听课状态检测场景、流水线工人工作状态检测场景等。该方法包括：In view of the above technical problems, an embodiment of the present invention provides a state detection method. FIG. 1 is one of the schematic flow charts of the state detection method provided by the present invention. As shown in Figure 1, the method can be applied to the driver state detection scenario. The following embodiments are all described with the driver state detection scenario. In addition, the method can also be applied to similar scenarios such as personnel state detection. For example: students Lecture status detection scene, assembly line worker work status detection scene, etc. The method includes:

步骤110，确定待检测图像。Step 110: Determine the image to be detected.

具体地，待检测图像可以是来自于由摄像头实时捕获的当前的包含有驾驶员的图像，也可以是包含有驾驶员的视频影像中的某一帧图像，本发明实施例对此不作限制。Specifically, the image to be detected may be from a current image containing the driver captured by the camera in real time, or may be a certain frame of image in the video image containing the driver, which is not limited in this embodiment of the present invention.

步骤120，基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并通过状态相关区域对待检测图像进行人员状态检测；Step 120, based on the state detection model, adaptively locate the state-related area in the image to be detected through spatial transformation, and perform personnel state detection on the image to be detected through the state-related area;

状态检测模型是基于样本图像和样本图像的人员状态类别标签训练得到的。The state detection model is trained based on the sample images and the person state category labels of the sample images.

为了避免采集固定的感兴趣区域进行人员状态检测所可能带来的一系列问题，本发明实施例中，根据待检测图像自身的特征，适应性定位状态相关区域，从而进行人员状态检测。此处，状态相关区域，即针对待检测图像自身而言，与人员状态相关的图像区域，例如，在人员打瞌睡的待检测图像中，手部区域可能不是重点，而眼睛区域则是需要定位的状态相关区域，例如，在揉眼动作，手眼重叠的区域是需要定位的状态相关区域，即，不同图像中的状态相关区域可能是不同的。通过自适应定位待检测图像中的状态相关区域，即以状态检测为目标导向进行状态相关区域的定位并据此进行人员状态检测，能够起到状态相关区域的准确定位的效果。In order to avoid a series of problems that may be caused by collecting a fixed region of interest for personnel status detection, in the embodiment of the present invention, according to the characteristics of the image to be detected itself, the status-related region is adaptively located, so as to perform personnel status detection. Here, the state-related area, that is, for the image to be detected itself, the image area related to the state of the person, for example, in the to-be-detected image where the person is dozing off, the hand area may not be the focus, while the eye area needs to be positioned The state-related area of for example, in the eye rubbing action, the area where the hand and eye overlap is the state-related area that needs to be located, that is, the state-related area in different images may be different. By adaptively locating the state-related area in the image to be detected, that is, locating the state-related area with the state detection as the target orientation, and then performing the personnel state detection, the effect of accurate positioning of the state-related area can be achieved.

为了实现自适应的状态相关区域的定位，本发明实施例在应用状态检测模型进行人员状态检测的过程中，引入了空间变换技术。具体到状态检测模型中，空间变换技术可以通过空间变换网络的形式实现，通过样本图像和样本图像的人员状态类别标签训练所得的状态检测模型中，空间变换网络能够学习到如何从输入的图像中定位到对于人员状态检测有用的区域，即空间变换网络能够具备自适应定位状态相关的图像区域的能力，因此可以通过空间变换自适应定位的待检测图像的相关区域对待检测图像进行人员状态检测。In order to realize the self-adaptive location of the state-related region, the embodiment of the present invention introduces a space transformation technology in the process of applying the state detection model to detect the state of the person. Specifically, in the state detection model, the space transformation technology can be implemented in the form of a space transformation network. In the state detection model trained by the sample images and the person state category labels of the sample images, the space transformation network can learn how to extract from the input image. To locate the area useful for personnel state detection, that is, the spatial transformation network has the ability to adaptively locate the image area related to the state.

由此得到的人员状态检测结果可以是各人员状态类别的概率，还可以是直接输出人员状态类别，并且根据状态检测结果判断是否预警提醒，本发明实施例对此不作限制。The thus obtained personnel status detection result may be the probability of each personnel status category, or may directly output the personnel status category, and determine whether to alert according to the status detection result, which is not limited in this embodiment of the present invention.

在执行步骤120之前，还需要预先训练得到状态检测模型，具体在状态检测模型训练时，可以分别将样本图像输入到训练中的模型中，从而得到模型针对样本图像输出的人员状态检测结果，在此基础上，将人员状态检测结果与样本图像和样本图像的人员状态类别标签进行比较，从而得到模型训练的损失值，基于损失值对模型参数进行迭代更新，在此过程中，模型可以学习到样本图像与人员状态检测结果之间的对应关系，使得训练得到的状态检测模型能够具备通过空间变换自适应定位的待检测图像的相关区域对待检测图像进行人员状态检测的能力。Before performing step 120, it is also necessary to pre-train to obtain a state detection model. Specifically, when the state detection model is trained, the sample images can be respectively input into the model under training, so as to obtain the personnel state detection results output by the model for the sample images. On this basis, the result of the person state detection is compared with the sample image and the person state category label of the sample image, so as to obtain the loss value of the model training, and the model parameters are iteratively updated based on the loss value. In this process, the model can learn The correspondence between the sample images and the results of the personnel status detection enables the trained status detection model to have the ability to perform personnel status detection on the to-be-detected images in the relevant regions of the to-be-detected images adaptively positioned through spatial transformation.

需要说明的是，状态检测模型中空间变换可以是一次或者多次，本发明实施例对此不作限制。其中，定位的状态相关区域可以是一个或者多个区域，并且通过状态相关区域对待检测图像进行人员状态检测可以是直接对状态相关区域的局部区域特征进行人员状态检测，还可以基于状态相关区域的局部特征和待检测图像的卷积特征所确定的图像特征进行人员状态检测，本发明实施例对此不作限制。It should be noted that, the space transformation in the state detection model may be one or more times, which is not limited in this embodiment of the present invention. Wherein, the located state-related area may be one or more areas, and performing personnel state detection on the image to be detected through the state-related area may directly perform personnel state detection on the local area features of the state-related area, or may also perform personnel state detection based on the state-related area The image feature determined by the local feature and the convolution feature of the image to be detected is used to detect the person state, which is not limited in this embodiment of the present invention.

本发明实施例提供的状态检测方法，通过对待检测图像特征进行空间变换，在待检测图像中自适应定位到与状态相关的区域，再通过对状态相关区域对待检测图像进行人员状态检测，实现了以状态相关区域为检测目标，得到在待检测图像中与状态相关的区域，减少了因固定区域检测导致的后续人员状态类别检测结果错误的问题，提高了人员状态类别检测的准确率。In the state detection method provided by the embodiment of the present invention, by performing spatial transformation on the characteristics of the image to be detected, adaptively locating the region related to the state in the image to be detected, and then by performing personnel state detection on the image to be detected in the state-related region, it realizes Taking the state-related area as the detection target, the state-related area in the image to be detected is obtained, which reduces the problem of incorrect detection results of subsequent personnel state categories caused by the detection of fixed areas, and improves the accuracy of personnel state category detection.

基于上述实施例，图2是本发明提供的状态检测模型检测状态的流程示意图。如图2所示，步骤120包括：Based on the above embodiment, FIG. 2 is a schematic flowchart of a state detection model provided by the present invention for detecting a state. As shown in Figure 2, step 120 includes:

步骤121，基于状态检测模型中的状态定位网络，通过空间变换自适应定位待检测图像中的状态相关区域，得到状态定位特征，并通过状态定位特征和待检测图像的卷积特征确定状态相关区域的图像特征，状态定位特征用于指示状态相关区域在待检测图像中的位置。Step 121, based on the state localization network in the state detection model, adaptively locate the state-related region in the image to be detected through spatial transformation, obtain the state localization feature, and determine the state-related region by the state localization feature and the convolution feature of the image to be detected The state location feature is used to indicate the position of the state-related region in the image to be detected.

步骤122，基于状态检测模型中的分类网络，应用状态相关区域的图像特征对待检测图像进行人员状态检测。Step 122 , based on the classification network in the state detection model, apply the image features of the state-related region to perform personnel state detection on the image to be detected.

具体地，将待检测图像输入到状态检测模型中，状态检测模型中的状态定位网络通过空间变换在待检测图像中检测状态相关的区域，得到用于指示状态相关区域在待检测图像中位置的状态定位特征。考虑到后续的分类网络能从全局特征的角度进行状态检测，进而能更准确地得到状态检测结果，因此，此处将状态定位特征和待检测图像的卷积特征进行融合得到状态相关区域的图像特征，然后，通过状态检测模型中的分类网络，应用状态相关区域的图像特征对待检测图像进行人员状态检测，即可得到人员状态检测结果。其中，待检测图像的卷积特征指的是待检测图像通过卷积网络最终得到的能够反映待检测图像整体的特征。Specifically, the image to be detected is input into the state detection model, and the state localization network in the state detection model detects the state-related regions in the to-be-detected image through spatial transformation, and obtains a state-related region indicating the position of the state-related region in the to-be-detected image. State location feature. Considering that the subsequent classification network can perform state detection from the perspective of global features, and thus can obtain the state detection results more accurately, here, the state localization feature and the convolution feature of the image to be detected are fused to obtain the image of the state-related region. Then, through the classification network in the state detection model, the image features of the state-related regions are used to detect the human state of the image to be detected, and the human state detection result can be obtained. Wherein, the convolutional feature of the image to be detected refers to the feature finally obtained by the image to be detected through the convolutional network, which can reflect the whole of the image to be detected.

基于上述实施例，图3是本发明提供的状态定位特征获取方法的流程示意图。如图3所示，步骤121包括：Based on the foregoing embodiment, FIG. 3 is a schematic flowchart of a method for acquiring a state location feature provided by the present invention. As shown in Figure 3,step 121 includes:

步骤310，基于状态定位网络中的多层卷积网络，对待检测图像进行特征提取，得到多层卷积网络中每层卷积输出的卷积特征；Step 310, based on the multi-layer convolution network in the state positioning network, perform feature extraction on the image to be detected, and obtain the convolution feature output by each layer of convolution in the multi-layer convolution network;

步骤320，基于状态定位网络中的空间变换网络，应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，直至得到最后一层的空间变换特征，并将最后一层的空间变换特征确定为状态定位特征，通过状态定位特征和最后一层卷积输出的卷积特征确定状态相关区域的图像特征。Step 320: Based on the spatial transformation network in the state positioning network, apply the convolution feature output by the convolution of the current layer and the spatial transformation feature obtained by the spatial transformation of the previous layer to perform spatial transformation to obtain the spatial transformation feature of the spatial transformation of the current layer, until The spatial transformation features of the last layer are obtained, and the spatial transformation features of the last layer are determined as state positioning features, and the image features of the state-related regions are determined by the state positioning features and the convolution features output by the last layer of convolution.

考虑到卷积网络随着卷积层的逐步加深会使得深层的卷积特征分辨率降低，会导致图片中细微的信息丢失，而浅层的卷积特征虽然包含有丰富的细节，但缺乏上下文的相互信息，即语义性较低。因此，本发明实施例通过不同深度卷积层的多次的空间变换，以逐步定位的方式得到状态定位特征。Considering that the convolutional network gradually deepens the convolutional layer, the resolution of the deep convolutional features will be reduced, which will lead to the loss of subtle information in the picture, while the shallow convolutional features contain rich details, but lack context. The mutual information, that is, the semantics are low. Therefore, in the embodiment of the present invention, the state positioning feature is obtained in a step-by-step positioning manner through multiple spatial transformations of different depth convolution layers.

具体地，步骤310中，状态定位网络中的多层卷积网络中，每一卷积层基于其前一卷积层输出的卷积特征进行特征提取，得到每一卷积层对应的卷积特征。其中，多层卷积网络中第一层卷积层是对待检测图像进行特征提取，得到第一层卷积层的卷积特征。此处的卷积层为虚拟的卷积层，每一个卷积层中可以包含有一个或者多个实体卷积层，虚拟的卷积层中的各层包含的实体卷积层的数量可以相同也可以不同，本发明实施例对此不作限制。Specifically, instep 310, in the multi-layer convolutional network in the state localization network, each convolutional layer performs feature extraction based on the convolutional features output by the previous convolutional layer to obtain the corresponding convolutional layer of each convolutional layer. feature. Among them, the first convolutional layer in the multi-layer convolutional network is to perform feature extraction on the image to be detected to obtain the convolutional features of the first convolutional layer. The convolutional layer here is a virtual convolutional layer, each convolutional layer can contain one or more physical convolutional layers, and each layer in the virtual convolutional layer can contain the same number of physical convolutional layers It may also be different, which is not limited in this embodiment of the present invention.

步骤320中，状态定位网络中的空间变换网络，根据步骤310中得到的每一个卷积层的卷积特征由浅层到深层的逐卷积层进行空间变换，应用当前卷积层输出的卷积特征和前一层空间变换所得的空间变换特征，进行空间变换，得到当前卷积层的空间变换特征，然后将后一层卷积层作为当前层进行上述操作，直至得到最后一卷积层的空间变换特征结束空间变换操作，将最后一卷积层的空间变换特征作为的状态检测模型定位得到的状态定位特征，并将该状态定位特征和最后一层卷积层输出的卷积特征进行融合得到状态相关区域的图像特征。其中，卷积网络中卷积层的浅层和深层是特征经过卷积层的顺序确定的，特征先输入的卷积层是后输入的卷积层的浅层，反之为深层。Instep 320, the spatial transformation network in the state positioning network performs spatial transformation from shallow to deep convolutional layers according to the convolutional features of each convolutional layer obtained instep 310, and applies the volume output by the current convolutional layer. Product features and the spatial transformation features obtained by the spatial transformation of the previous layer, perform spatial transformation to obtain the spatial transformation features of the current convolutional layer, and then use the subsequent convolutional layer as the current layer to perform the above operations until the last convolutional layer is obtained. The spatial transformation feature of the last convolutional layer is used as the state positioning feature obtained by the state detection model, and the state positioning feature and the convolutional feature output by the last convolutional layer are processed. The image features of the state-related regions are obtained by fusion. Among them, the shallow and deep layers of the convolutional layer in the convolutional network are determined by the order in which the features are passed through the convolutional layers.

需要说明的是，当前层的空间变换特征可以是基于当前卷积层输出的卷积特征和前一层空间变换所得的空间变换特征进行特征融合，并对得到的融合特征进行空间变换得到，此外，当前卷积层为第一层卷积层时，即不存在前一层卷积层变换得到的空间变换特征，此时直接对第一层卷积层输出的卷积特征进行空间变换，以得到第一层卷积层对应的空间变换特征。It should be noted that the spatial transformation feature of the current layer can be obtained by performing feature fusion based on the convolutional feature output by the current convolutional layer and the spatial transformation feature obtained by the spatial transformation of the previous layer, and performing spatial transformation on the obtained fusion feature. , when the current convolutional layer is the first convolutional layer, that is, there is no spatial transformation feature obtained by the transformation of the previous convolutional layer. The spatial transformation features corresponding to the first convolutional layer are obtained.

基于上述实施例，步骤320中，应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，包括：Based on the above embodiment, instep 320, the convolution feature output by the convolution of the current layer and the spatial transformation feature obtained by the spatial transformation of the previous layer are applied to perform spatial transformation to obtain the spatial transformation feature of the spatial transformation of the current layer, including:

考虑到浅层特征包含有更多的细节特征，而深层特征包含有更多的语义特征，将两者进行特征融合能够有效地提高模型的性能。因此，本发明实施例通过将当前卷积层输出的卷积特征和前一层空间变换所得的状态定位特征进行特征融合，再对特征融合得到的融合特征进行空间变换，得到当前卷积层空间变换的状态定位特征，并由浅层到深层逐层获取每一层对应的状态定位特征，直至得到最后一层卷积层空间变换的状态定位特征，结束空间变换操作。Considering that the shallow features contain more detailed features and the deep features contain more semantic features, the feature fusion of the two can effectively improve the performance of the model. Therefore, in the embodiment of the present invention, the current convolution layer space is obtained by performing feature fusion between the convolution feature output by the current convolution layer and the state positioning feature obtained by the spatial transformation of the previous layer, and then performing spatial transformation on the fusion feature obtained by the feature fusion. The transformed state positioning feature is obtained, and the corresponding state positioning feature of each layer is obtained layer by layer from the shallow layer to the deep layer, until the state positioning feature of the spatial transformation of the last convolutional layer is obtained, and the spatial transformation operation is ended.

需要说明的是，当前卷积层为第一层卷积层时，可以直接使用第一层卷积层的卷积特征进行空间变换得到状态定位特征。It should be noted that when the current convolutional layer is the first convolutional layer, the convolutional feature of the first convolutional layer can be directly used for spatial transformation to obtain the state positioning feature.

本发明实施例提供的状态检测方法，通过对多层卷积网络的各卷积层对应的卷积特征进行逐层空间变换，得到状态定位特征，实现了浅层得到的状态定位特征与深层卷积特征的逐步融合，以逐步对状态相关区域进行定位，使得以空间变换自适应定位得到的待检测图像的状态定位特征更加准确，进一步提高了状态检测的准确度。The state detection method provided by the embodiment of the present invention obtains the state localization feature by performing layer-by-layer spatial transformation on the convolutional features corresponding to each convolutional layer of the multi-layer convolutional network, and realizes the state localization feature obtained by the shallow layer and the deep volume The gradual fusion of product features is used to gradually locate the state-related regions, so that the state positioning features of the image to be detected obtained by adaptive positioning of spatial transformation are more accurate, and the accuracy of state detection is further improved.

基于上述实施例，图4是本发明提供的状态检测方法的流程示意图之二。如图4所示，步骤122包括：Based on the above embodiment, FIG. 4 is the second schematic flowchart of the state detection method provided by the present invention. As shown in Figure 4,step 122 includes:

步骤410，基于分类网络中的光照感知网络，对状态相关区域的图像特征进行光照强度均衡，得到状态相关区域的均衡特征；Step 410, based on the illumination perception network in the classification network, perform illumination intensity equalization on the image features of the state-related regions to obtain balanced features of the state-related regions;

步骤420，基于分类网络中的状态分类网络，应用状态相关区域的均衡特征对待检测图像进行人员状态检测。Step 420 , based on the state classification network in the classification network, apply the balanced feature of the state-related region to perform personnel state detection on the image to be detected.

考虑到驾驶员在驾驶汽车的过程中，车外部的环境十分复杂，例如：白天、夜晚、晴天或者阴天等，此时待检测图像会因为环境因素出现光照强度存在很大的差异。因此，本发明为了能够为分类网络中的状态分类网络提供较为稳定的特征输入，在将图像特征输入至状态分类网络进行分类之前，对图像特征进行光照强度均衡。Considering that when the driver is driving the car, the external environment of the car is very complex, such as: day, night, sunny or cloudy, etc. At this time, the image to be detected will have great differences in light intensity due to environmental factors. Therefore, in order to provide a relatively stable feature input for the state classification network in the classification network, the present invention performs illumination intensity equalization on the image features before inputting the image features into the state classification network for classification.

具体地，在分类网络中构建光照感知网络，用于对状态定位网络输出的状态相关区域的图像特征进行光照强度均衡。将状态定位网络输出的状态相关区域的图像特征输入至光照感知网络中进行光照强度均衡操作，得到该状态相关区域的均衡特征，再将该状态相关区域的均衡特征输入至分类网络进行人员状态检测。Specifically, an illumination perception network is constructed in the classification network to perform illumination intensity equalization on the image features of the state-related regions output by the state localization network. The image features of the state-related regions output by the state positioning network are input into the light perception network to perform light intensity equalization operations, and the balanced features of the state-related regions are obtained, and then the balanced features of the state-related regions are input into the classification network for personnel state detection. .

需要说明的是，光照感知网络可以通过光照恢复模型对图像特征的进行光照恢复，从而使得图像特征的光照强度得以均衡，即得到均衡特征，还可以是通过强光感知网络提取强光特征，通过弱光感知网络提取弱光特征，通过权重融合网络计算光照强度权重值，然后根据强光特征、弱光特征以及光照强度权重值进行加权，以得到均衡特征，本发明实施例对此不作限制。It should be noted that the illumination perception network can restore the illumination of the image features through the illumination restoration model, so that the illumination intensity of the image features can be balanced, that is, the balanced features can be obtained. The weak light perception network extracts the weak light feature, calculates the light intensity weight value through the weight fusion network, and then performs weighting according to the strong light feature, the low light feature, and the light intensity weight value to obtain the balanced feature, which is not limited in this embodiment of the present invention.

本发明实施例提供的状态检测方法，通过在状态检测模型中增加光照感知网络，实现了对状态相关区域的图像特征进行了光照强度均衡，使得输入至分类网络中的特征更加稳定，进一步提高了状态检测的准确度。In the state detection method provided by the embodiment of the present invention, by adding an illumination perception network to the state detection model, the illumination intensity is balanced for the image features of the state-related regions, so that the features input into the classification network are more stable, and the improvement is further improved. Accuracy of state detection.

基于上述实施例，图5是本发明提供的均衡特征获取方法的流程示意图。如图5所示，步骤410包括：Based on the foregoing embodiment, FIG. 5 is a schematic flowchart of a method for obtaining an equalization feature provided by the present invention. As shown in Figure 5, step 410 includes:

步骤411，基于光照感知网络中的强光感知网络分支和弱光感知网络分支，分别对状态相关区域的图像特征进行光照特征提取，得到状态相关区域的强光特征和状态相关区域的弱光特征。Step 411 , based on the strong light sensing network branch and the weak light sensing network branch in the illumination sensing network, respectively perform illumination feature extraction on the image features of the state-related regions to obtain the strong-light features of the state-related regions and the weak-light features of the state-related regions .

考虑到待检测图像会出现因环境因素导致光照强度的差异很大的情况，驾驶员状态检测涉及到行车安装，需要有较高的实时性，而现有的光照恢复模型主要是通过线性迭代的方式将复杂光照变化转为轻微/中度光照变化需要较高的处理时间，同时，在保证实时性的同时还需要能够适应环境的突然变化，例如：由强光道路进入弱光隧道，因此，本发明实施例通过强光感知网络分支、弱光感知网络分支及权重融合分支联合对状态相关区域的图像特征进行光照均衡，得到状态相关区域的均衡特征。Considering that the image to be detected will have a large difference in light intensity due to environmental factors, the driver state detection involves driving installation, which requires high real-time performance, and the existing light restoration model is mainly through linear iteration. Converting complex lighting changes to slight/moderate lighting changes requires high processing time. At the same time, it needs to be able to adapt to sudden changes in the environment while ensuring real-time performance. For example, from a strong light road to a weak light tunnel, The embodiment of the present invention performs illumination equalization on the image features of the state-related regions through the combination of the strong light sensing network branch, the weak light sensing network branch and the weight fusion branch, so as to obtain the balanced features of the state-related regions.

具体地，将状态相关区域的图像特征输入至光照感知网络中，光照感知网络中的强光感知网络分支和弱光感知网络分支以并行方式进行光照特征提取，分别得到状态相关区域的强光特征和状态相关区域的弱光特征。Specifically, the image features of the state-related regions are input into the light-sensing network, and the strong-light-sensing network branch and the weak-light-sensing network branch in the light-sensing network perform light feature extraction in parallel to obtain the strong light features of the state-related regions respectively. and low-light characteristics of state-related regions.

需要说明的是强光感知网络分支和弱光感知网络分支的网络结构相同，网络结构中包括两个不同类型的同纬度卷积，用于提高网络对不同尺度目标的适应性。强光感知网络分支和弱光感知网络分支使用不同的样本集进行单独训练得到。It should be noted that the network structure of the strong light sensing network branch and the weak light sensing network branch is the same, and the network structure includes two different types of convolutions of the same latitude, which are used to improve the adaptability of the network to targets of different scales. The strong light sensing network branch and the weak light sensing network branch are obtained by separate training using different sample sets.

步骤412，基于光照感知网络中权重融合分支，对状态相关区域的图像特征进行预测，得到光照强度权重值，并基于光照强度权重值，对强光感知特征和弱光感知特征进行加权，得到状态相关区域的均衡特征。Step 412, based on the weight fusion branch in the light perception network, predict the image features of the state-related area to obtain the light intensity weight value, and based on the light intensity weight value, weight the strong light perception feature and the low light perception feature to obtain the state. Equilibrium characteristics of relevant regions.

如前文所述，为了能够适应环境的突然变化，需要能够自适应调节状态相关区域的强光特征和状态相关区域的弱光特征的权重比。As mentioned above, in order to be able to adapt to sudden changes in the environment, it is necessary to be able to adaptively adjust the weight ratio of the strong light feature of the state-related region to the weak light feature of the state-related region.

具体地，光照感知网络中还存在权重融合分支，用于对状态相关区域的图像特征进行权重预测，得到光照强度权重值。并根据该光照强度权重值对强光感知特征和弱光感知特征进行加权，以得到状态相关区域的均衡特征。Specifically, there is also a weight fusion branch in the light perception network, which is used to predict the weight of the image features of the state-related area to obtain the light intensity weight value. The strong light perception feature and the weak light perception feature are weighted according to the light intensity weight value, so as to obtain the balanced feature of the state-related area.

需要说明的是，考虑到进一步提高光照感知网络的执行效率，步骤412中的预测光照强度权重值的操作可以和步骤411并行处理，步骤412中的对强光感知特征和弱光感知特征进行加权，得到状态相关区域的均衡特征则需等待步骤411执行完成以及预测光照强度权重值的操作执行完成后进行处理。It should be noted that, in order to further improve the execution efficiency of the light sensing network, the operation of predicting the light intensity weight value instep 412 can be processed in parallel withstep 411, and the weighting of the strong light sensing feature and the weak light sensing feature instep 412 is performed. , to obtain the balanced feature of the state-related area, it is necessary to wait for the completion ofstep 411 and the completion of the operation of predicting the light intensity weight value before processing.

本发明实施例提供的状态检测方法，通过强光感知网络分支、弱光感知网络分支及权重融合分支并行处理得到状态相关区域的均衡特征，实现了以并行的方式，自适应感知光照强度输出光照均衡的均衡特征，提高了光照感知网络对复杂光照条件下的特征提取能力，以及提高了网络的执行效率。In the state detection method provided by the embodiment of the present invention, the balanced features of the state-related regions are obtained through parallel processing of the strong light sensing network branch, the weak light sensing network branch and the weight fusion branch, so as to realize the adaptive sensing of light intensity and output light in a parallel manner. The balanced and balanced features improve the feature extraction ability of the light perception network under complex lighting conditions, and improve the execution efficiency of the network.

基于上述实施例，图6是本发明提供的状态检测模型训练方法的流程示意图。如图6所示，状态检测模型基于如下步骤训练得到：Based on the foregoing embodiment, FIG. 6 is a schematic flowchart of a state detection model training method provided by the present invention. As shown in Figure 6, the state detection model is trained based on the following steps:

步骤610，确定初始检测模型；初始检测模型包括初始状态定位网络、初始光线感知网络和初始状态分类网络；Step 610, determine an initial detection model; the initial detection model includes an initial state positioning network, an initial light perception network and an initial state classification network;

步骤620，基于样本图像以及样本图像的人员状态类别标签对初始检测模型进行训练，得到状态定位网络和光线感知网络。Instep 620, an initial detection model is trained based on the sample image and the person state category label of the sample image, and a state location network and a light perception network are obtained.

考虑到对人员状态进行检测主要分为状态相关区域的图像特征提取和对提取的状态相关区域的图像特征进行检测，因此，为了使得状态检测结果能够更加准确，本发明实施例通过分阶段训练的方式分别训练状态相关区域的图像特征提取部分网络和分类网络，以提高状态检测模型的状态相关区域的图像特征提取能力和分类能力。Considering that the detection of personnel status is mainly divided into the image feature extraction of the status-related area and the detection of the extracted image features of the status-related area, therefore, in order to make the status detection result more accurate, the embodiment of the present invention adopts the staged training method. The image feature extraction partial network and the classification network of the state-related regions are trained separately in order to improve the image feature extraction and classification capabilities of the state-related regions of the state detection model.

具体地，第一阶段，对应步骤610和步骤620，由初始状态定位网络、初始光线感知网络和初始状态分类网络构建初始检测模型，使用样本图像以及样本图像的人员状态类别标签对初始检测模型进行训练，直至训练完成，将训练完成的初始检测模型中的初始状态定位网络和初始光线感知网络作为状态定位网络和光线感知网络以供第二阶段训练使用。其中，初始状态定位网络和初始光线感知网络的网络学习率不同，例如：初始光线感知网络的参数为初始状态定位网络学习率的五分之一或者十分之一，本发明实施例对此不作限制。Specifically, in the first stage, corresponding tosteps 610 and 620, an initial detection model is constructed by the initial state localization network, the initial light perception network and the initial state classification network, and the initial detection model is performed by using the sample images and the personnel state category labels of the sample images. Until the training is completed, the initial state localization network and the initial light perception network in the trained initial detection model are used as the state localization network and the light perception network for the second stage training. The network learning rates of the initial state positioning network and the initial light sensing network are different. For example, the parameters of the initial light sensing network are one-fifth or one-tenth of the learning rate of the initial state positioning network, which is not implemented in this embodiment of the present invention. limit.

步骤630，确定中间检测模型；中间检测模型包括状态定位网络和光线感知网络，以及初始状态分类网络；Step 630, determine an intermediate detection model; the intermediate detection model includes a state positioning network, a light perception network, and an initial state classification network;

步骤640，固定状态定位网络和光线感知网络的参数，基于样本图像以及样本图像的人员状态类别标签对中间检测模型进行训练，得到状态分类网络。Instep 640, the parameters of the state positioning network and the light perception network are fixed, and the intermediate detection model is trained based on the sample image and the person state category label of the sample image, so as to obtain a state classification network.

具体地，第二阶段，对应步骤630和步骤640，由状态定位网络和光线感知网络和初始状态分类网络构建中间检测模型，并且固定状态定位网络和光线感知网络的参数，通过样本图像以及样本图像的人员状态类别标签对中间检测模型中的初始状态分类网络进行训练，直至训练完成，得到状态分类网络，此时，由第一阶段训练得到的状态定位网络和光线感知网络，以及第二阶段训练得到的状态分类网络构成最终的状态检测模型。Specifically, in the second stage, corresponding tosteps 630 and 640, an intermediate detection model is constructed by the state localization network, the light perception network and the initial state classification network, and the parameters of the state localization network and the light perception network are fixed. The initial state classification network in the intermediate detection model is trained until the training is completed, and the state classification network is obtained. At this time, the state positioning network and light perception network obtained by the first stage training, and the second stage training The resulting state classification network constitutes the final state detection model.

基于上述实施例，图7是本发明提供的状态分类网络的训练方法的流程示意图。如图7所示，步骤640包括：Based on the above embodiment, FIG. 7 is a schematic flowchart of a training method for a state classification network provided by the present invention. As shown in Figure 7,step 640 includes:

步骤641，基于各人员状态类别下的样本图像的样本数据量，确定各人员状态类别的权重；Step 641: Determine the weight of each personnel status category based on the sample data volume of the sample images under each personnel status category;

步骤642，基于中间检测模型基于样本图像输出的状态分类结果、样本图像的人员状态类别标签和各人员状态类别的权重进行损失计算，并基于损失计算结果更新初始分类网络的参数，直至损失计算结果收敛，得到状态分类网络。Step 642 , calculate the loss based on the state classification result output by the sample image, the personnel state category label of the sample image, and the weight of each personnel state category based on the intermediate detection model, and update the parameters of the initial classification network based on the loss calculation result until the loss calculation result Convergence and get the state classification network.

考虑到由于各人员状态类别图像样本的数据量不相同，特别是一些人员状态类别的样本较为难以获取，例如：驾驶员抽烟或者打电话等状态，导致训练完成的模型会出现长尾问题，即样本数据量少的人员状态类别检测结果不准确。因此，本发明实施例以类别权重的方式对分类网络进行训练。Considering that due to the different amount of data of image samples of various personnel status categories, especially the samples of some personnel status categories are more difficult to obtain, for example: the driver smokes or makes a phone call, etc., the trained model will have a long tail problem, that is, The detection results of personnel status categories with a small amount of sample data are inaccurate. Therefore, the embodiment of the present invention trains the classification network in the manner of class weights.

具体地，先通过各人员状态类别下的样本图像的样本数据量，确定人员状态类别的权重，然后根据中间检测模型输出的样本图像对应的状态分类结果、样本图像的人员状态类别标签和各人员状态类别的权重进行损失计算，然后根据损失计算结果更新初始分类网络的参数，当损失计算结果达到收敛，则完成训练得到状态分类网络。Specifically, first determine the weight of the personnel status category according to the sample data volume of the sample images under each personnel status category, and then according to the status classification results corresponding to the sample images output by the intermediate detection model, the personnel status category labels of the sample images, and each personnel status The weight of the state category is used for loss calculation, and then the parameters of the initial classification network are updated according to the loss calculation result. When the loss calculation result reaches convergence, the training is completed to obtain the state classification network.

需要说明的是，人员状态类别的权重可以基于该人员状态类别的样本数据量占全部人员状态类别的样本数据重量的比例得到，还可以基于样本数据量与权重的映射关系得到，本发明实施例对此不作限制。此外，上述根据中间检测模型输出的样本图像对应的状态分类结果、样本图像的人员状态类别标签和各人员状态类别的权重进行损失计算，可以如以下方式进行计算：It should be noted that the weight of the personnel status category can be obtained based on the proportion of the sample data amount of the personnel status category to the weight of the sample data of all personnel status categories, and can also be obtained based on the mapping relationship between the sample data amount and the weight. Embodiments of the present invention There is no restriction on this. In addition, the above loss calculation is performed according to the state classification result corresponding to the sample image output by the intermediate detection model, the personnel state category label of the sample image, and the weight of each personnel status category, which can be calculated as follows:

首先，根据人员状态类别的样本数据量确定人员状态类别的权重值w＝[w₁,w₂……w_m]，其中，m为人员状态的类别数。First, the weight value w=[w₁ , w₂ ...... w_m ] of the personnel status category is determined according to the sample data amount of the personnel status category, where m is the number of categories of the personnel status.

然后，对人员状态类别的权重值w＝[w₁,w₂……w_m],初始状态分类网络输出的样本图像的状态预测结果为p＝[p₁,p₂……p_m]，以及样本图像的人员状态类别标签为y＝[y₁,y₂……y_m]进行损失计算，然后根据损失计算结果更新初始分类网络的参数，当损失计算结果达到收敛，则完成训练得到状态分类网络。其中，损失函数具体表示为：Then, for the weight value w=[w₁ , w₂ ……w_m ] of the personnel state category, the state prediction result of the sample image output by the initial state classification network is p=[p₁ , p₂ …… p_m ], And the person state category label of the sample image is y=[y₁ , y₂_...... Classification network. Among them, the loss function is specifically expressed as:

式中，γ为超参数，m为人员状态的类别数，w_i为第i个人员状态类别的权重值，y_i为样本图像的第i个人员状态类别的标签值，p_i为样本图像的第i个人员状态类别的状态预测结果。In the formula, γ is the hyperparameter, m is the number of categories of personnel status, w_i is the weight value of the i-th personnel status category, y_i is the label value of the i-th personnel status category of the sample image, and p_i is the sample image The state prediction result of the i-th person state category.

本发明实施例提供的状态检测方法，通过在状态检测模型训练的过程中，根据各人员状态类别样本图像的样本数量计算各人员状态类别的权重，并基于各人员状态类别的权重进行损失计算，增大尾部样本的权重，解决了样本分布不均衡的问题，提高了状态检测模型对样本量较少的人员状态类别的识别准确率。In the state detection method provided by the embodiment of the present invention, in the process of training the state detection model, the weight of each personnel state category is calculated according to the number of sample images of each personnel status category, and the loss calculation is performed based on the weight of each personnel status category, Increasing the weight of the tail samples solves the problem of unbalanced sample distribution, and improves the recognition accuracy of the status detection model for personnel status categories with a small sample size.

基于上述实施例，图8是本发明提供的状态检测模型的网络框架图，图中，

表示连接，

表示相乘，

表示相加。如图8所示，该模型的执行流程，具体如下：Based on the above embodiment, FIG. 8 is a network frame diagram of the state detection model provided by the present invention. In the figure,

means connection,

means multiply,

means to add. As shown in Figure 8, the execution flow of the model is as follows:

步骤810，对待检测驾驶员图像进行预处理之后，使用状态检测模型中的状态定位网络提取状态定位特征，具体操作细节如下：Step 810, after preprocessing the image of the driver to be detected, use the state location network in the state detection model to extract the state location feature, and the specific operation details are as follows:

步骤811，将多层卷积网络作为特征提取器，其中，多层卷积网络可以为任意CNN网络，本发明实施例以resnet50的CNN网络为例子，res1-2对待检测驾驶员图像进行特征提取得到底层特征(卷积特征)F1，res3对底层特征F1进行特征提取得到中层特征(卷积特征)，res4对中层特征F2进行特征提取得到深层特征(卷积特征)F3。首先对底层特征F1使用STN(空间变换网络)来定位状态相关区域，得到状态定位特征LF1。Step 811, using the multi-layer convolutional network as the feature extractor, wherein the multi-layer convolutional network can be any CNN network, the embodiment of the present invention takes the CNN network of resnet50 as an example, and res1-2 perform feature extraction on the image of the driver to be detected. The bottom layer feature (convolution feature) F1 is obtained, res3 performs feature extraction on the bottom layer feature F1 to obtain the middle layer feature (convolution feature), and res4 performs feature extraction on the middle layer feature F2 to obtain the deep layer feature (convolution feature) F3. Firstly, STN (spatial transformation network) is used for the underlying feature F1 to locate the state-related region, and the state localization feature LF1 is obtained.

其中，图9是本发明提供的STN网络结构图，图中，

表示相乘。如图9所示，STN能够对图像特征进行空间变换，例如裁剪、平移和缩放，自适应地发现不同状态的区分区域。首先通过全连接层FC生成空间变换的参数θ＝[θ₁，θ₂，θ₃，θ₄]，其中θ₁，θ₂为缩放参数，θ₃，θ₄为平移参数，参数的值通过sigmoid或tanh函数约束在(0，1)或者(-1，1)区间内，通过这四个参数获得一个包围盒，通过下列表达式得到新的像素点坐标：Wherein, Fig. 9 is the STN network structure diagram provided by the present invention, in the figure,

means multiply. As shown in Figure 9, STN can perform spatial transformations on image features, such as cropping, translation, and scaling, to adaptively discover discriminative regions in different states. First, the parameters of the spatial transformation θ=[θ₁ , θ₂ , θ₃ , θ₄ ] are generated by the fully connected layer FC, where θ₁ , θ₂ are scaling parameters, θ₃ , θ₄ are translation parameters, and the values of the parameters pass through The sigmoid or tanh function is constrained in the (0, 1) or (-1, 1) interval, a bounding box is obtained through these four parameters, and the new pixel coordinates are obtained through the following expressions:

式中，

表示图像中某个像素点的原坐标，

转换之后的坐标。In the formula,

represents the original coordinates of a pixel in the image,

The transformed coordinates.

步骤812，将状态定位特征LF1和中层特征F2连接，连接后的特征再次经过STN，得到中层的状态定位特征LF2。Step 812: Connect the state localization feature LF1 and the mid-level feature F2, and the connected feature passes through the STN again to obtain the mid-level state localization feature LF2.

步骤813，将状态定位特征LF2和深层特征F3连接，连接后的特征再次经过STN，得到底层的状态定位特征LF3，将底层的状态定位特征LF3和F3进行融合得到状态相关区域的图像特征。CNN的深层特征具有更粗的分辨率，可能会有一些细微的丢失。相比之下，浅层的特征包含更丰富的细节，但是缺乏上下文信息，可以看出低级细节和高级语义是互补的，因此，本发明实施例通过渐进定位的方式，逐步精准定位状态相关区域。Step 813: Connect the state location feature LF2 and the deep feature F3, and the connected feature passes through the STN again to obtain the bottom state location feature LF3, and fuse the bottom layer state location feature LF3 and F3 to obtain the image feature of the state-related region. The deep features of CNN have coarser resolution and may have some subtle loss. In contrast, the shallow features contain richer details, but lack contextual information. It can be seen that the low-level details and high-level semantics are complementary. Therefore, the embodiment of the present invention gradually and accurately locates the state-related regions by means of progressive positioning. .

步骤820，经过步骤810获得的状态相关区域的图像特征可能来自强光或者弱光条件下的驾驶员图像，因此，本发明实施例构建了光照感知网络，光照感知网络整合两个分支的有效特征，可以处理任何样式图像，通过这种方式，可以很好地解决光照差异的问题，具体操作细节如下：In step 820, the image features of the state-related area obtained in step 810 may come from the driver image under strong light or weak light conditions. Therefore, in the embodiment of the present invention, a light perception network is constructed, and the light perception network integrates the effective features of the two branches. , which can handle any style image. In this way, the problem of lighting differences can be well solved. The specific operation details are as follows:

步骤821，光照感知网络包括：强光感知网络分支和弱光感知网络分支，其中，强光感知网络分支和弱光感知网络分支均为CNN网络。图10是本发明提供的光照感知网络的网络框架图，图中，Conv表示卷积，Concat表示连接。如图10所示，分别使用不同的光照样本图像数据训练，学习特定光照条件下的特征表示。在每个分支中，第一个1×1卷积用于捕获特定于光照的特征表示。然后利用另外两个带有半通道的1×1卷积层来降低输入特征的维数，将其分为两个流，并送入两种类型的3×3卷积，以提高网络对不同尺度目标的适应性。强光感知网络分支和弱光感知网络分支的输出作为特定于光照的特征表示连接在一起。状态相关区域的图像特征分别经过强光感知网络分支和弱光感知网络分支，得到强光特征F_a和弱光特征F_b。Step 821, the light sensing network includes: a strong light sensing network branch and a weak light sensing network branch, wherein the strong light sensing network branch and the weak light sensing network branch are both CNN networks. FIG. 10 is a network frame diagram of the light sensing network provided by the present invention, in the figure, Conv represents convolution, and Concat represents connection. As shown in Figure 10, different illumination sample image data are used for training, and the feature representation under specific illumination conditions is learned. In each branch, the first 1×1 convolution is used to capture illumination-specific feature representations. Then two other 1×1 convolutional layers with half channels are utilized to reduce the dimensionality of the input features, which are split into two streams and fed into two types of 3×3 convolutions to improve the network’s ability to understand different Scale target adaptability. The outputs of the bright-light-sensing network branch and the low-light-sensing network branch are concatenated together as illumination-specific feature representations. The image features of the state-related regions pass through the strong light sensing network branch and the weak light sensing network branch, respectively, to obtain the strong light feature_{Fa and the weak light feature F b}_.

步骤822，在实际检测过程中，往往输入的图像只是一种光照场景，为了能够自适应得到该图像的均衡特征，在光照感知网络中构建一个权重融合分支，在给定一个模态输入的情况下，自适应地集成两个支路输出的特征。这样，无论输入哪种模态，都可以得到有效的特征。本发明实施例利用强光感知网络分支和弱光感知网络分支并行输出的强光特征和弱光特征用归一化权值加权进行融合，从实现两个分支特征的自适应融合。具体使用一个基于sigmoid的权重融合分支来预测一个自适应的权值来进行模态选择来解决这个问题。Step 822: In the actual detection process, the input image is often only a lighting scene. In order to adaptively obtain the balanced characteristics of the image, a weight fusion branch is constructed in the lighting perception network. In the case of a given modal input Next, the features of the two branch outputs are adaptively integrated. In this way, valid features can be obtained regardless of the input modality. In the embodiment of the present invention, the strong light feature and the weak light feature output in parallel by the strong light sensing network branch and the weak light sensing network branch are fused by normalized weight weighting, so as to realize the adaptive fusion of the two branch features. Specifically, a sigmoid-based weight fusion branch is used to predict an adaptive weight for modality selection to solve this problem.

权重融合分支由一个全局平均池(GAP)层和两个全连接层(FC)组成，然后是一个具有可学习参数的Sigmoid函数。通过这种方式，可以预测一个归一化的选择权值，并执行软选择，如下所示：The weight fusion branch consists of a global average pooling (GAP) layer and two fully connected layers (FC), followed by a sigmoid function with learnable parameters. In this way, a normalized selection weight can be predicted and soft selection performed as follows:

F＝α*F_a+(1-α)*F_bF=α*F_a +(1-α)*F_b

式中，α是选择的权重，x为状态相关区域的图像特征经过GAP和两个FC层的输出特征，k为基于训练调整得到的参数，适应不同光照的变化，通过这种方法，可以在复杂场景下预测出更合适的选择权值，适应输入跨模型数据的变化，并保持模态切换时的特征识别能力。In the formula, α is the selected weight, x is the output feature of the image feature of the state-related area through GAP and two FC layers, and k is the parameter adjusted based on training to adapt to the changes of different lighting. It can predict more appropriate selection weights in complex scenarios, adapt to changes in input cross-model data, and maintain the feature recognition ability during mode switching.

步骤830，对经过步骤820输出的状态相关区域的均衡特征进行人员状态检测，具体操作细节如下：Step 830, perform personnel status detection on the balanced feature of the status-related area output in step 820, and the specific operation details are as follows:

步骤831，首先状态相关区域的均衡特征通过状态分类网络中的全局平均池化层，然后附加m个1x1卷积层，m为状态的类别数。Step 831: First, the balanced features of the state-related regions pass through the global average pooling layer in the state classification network, and then addm 1×1 convolutional layers, where m is the number of state categories.

步骤832，对每个经过1x1卷积的特征，使用状态分类网络中的线性层获得该状态的类别预测结果，然后将所有预测的结果拼接在一起，获得最终的驾驶员状态预测结果p＝[p₁,p₂……p_m]，并且根据该状态检测结果判断是否预警提醒，其中，m为人员状态的类别数。Step 832, for each feature that has undergone 1x1 convolution, use the linear layer in the state classification network to obtain the class prediction result of the state, and then splicing all the predicted results together to obtain the final driver state prediction result p=[ p₁ , p₂ ...... p_m ], and according to the state detection result, it is judged whether to warn or not, where m is the number of categories of personnel states.

下面对本发明提供的状态检测装置进行描述，下文描述的状态检测装置与上文描述的状态检测方法可相互对应参照。The state detection device provided by the present invention is described below, and the state detection device described below and the state detection method described above can be referred to each other correspondingly.

图11是本发明提供的状态检测装置的结构示意图。如图11所示，该装置包括：确定模块1110和检测模块1120。FIG. 11 is a schematic structural diagram of a state detection device provided by the present invention. As shown in FIG. 11 , the apparatus includes: adetermination module 1110 and adetection module 1120 .

其中，in,

确定模块1110，用于确定待检测图像；adetermination module 1110, configured to determine the image to be detected;

检测模块1120，用于基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并基于状态相关区域对待检测图像进行人员状态检测；Thedetection module 1120 is configured to adaptively locate the state-related region in the image to be detected through spatial transformation based on the state detection model, and perform personnel state detection on the image to be detected based on the state-related region;

在本发明实施例中，通过确定模块1110，用于确定待检测图像；检测模块1120，用于基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并基于状态相关区域对待检测图像进行人员状态检测；状态检测模型是基于样本图像和样本图像的人员状态类别标签训练得到的，实现了以状态相关区域为检测目标，得到在待检测图像中与状态相关的区域，减少了因固定区域检测导致的后续人员状态类别检测结果错误的问题，提高了人员状态类别检测的准确率。In the embodiment of the present invention, thedetermination module 1110 is used to determine the image to be detected; thedetection module 1120 is used to adaptively locate the state-related region in the image to be detected through spatial transformation based on the state detection model, and based on the state-related region Person state detection is performed on the image to be detected; the state detection model is trained based on the sample image and the person state category label of the sample image, which realizes the state-related area as the detection target, and obtains the state-related area in the image to be detected. It solves the problem of incorrect detection results of subsequent personnel status categories caused by fixed area detection, and improves the accuracy of personnel status category detection.

基于上述任一实施例，检测模块1120包括：Based on any of the above embodiments, thedetection module 1120 includes:

状态定位子模块，用于基于状态检测模型中的状态定位网络，通过空间变换自适应定位待检测图像中的状态相关区域，得到状态定位特征，并通过状态定位特征和待检测图像的卷积特征确定状态相关区域的图像特征，状态定位特征用于指示状态相关区域在待检测图像中的位置；The state localization sub-module is used to adaptively locate the state-related area in the image to be detected through spatial transformation based on the state localization network in the state detection model, to obtain the state localization feature, and obtain the state localization feature through the state localization feature and the convolution feature of the image to be detected. Determine the image features of the state-related area, and the state-location feature is used to indicate the position of the state-related area in the image to be detected;

状态分类子模块，用于基于状态检测模型中的分类网络，应用状态相关区域的图像特征对待检测图像进行人员状态检测。The state classification sub-module is used to perform personnel state detection on the image to be detected by applying the image features of the state-related region based on the classification network in the state detection model.

基于上述任一实施例，状态定位子模块包括：Based on any of the above embodiments, the state positioning submodule includes:

特征提取子模块，用于基于状态定位网络中的多层卷积网络，对待检测图像进行特征提取，得到多层卷积网络中每层卷积输出的卷积特征；The feature extraction sub-module is used to perform feature extraction on the image to be detected based on the multi-layer convolution network in the state localization network, and obtain the convolution features output by each layer of convolution in the multi-layer convolution network;

空间变换子模块，用于基于状态定位网络中的空间变换网络，应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，直至得到最后一层的空间变换特征，并将最后一层的空间变换特征确定为状态定位特征，通过状态定位特征和最后一层卷积输出的卷积特征确定状态相关区域的图像特征。The spatial transformation sub-module is used for the spatial transformation network in the state-based positioning network, applying the convolution features output by the convolution of the current layer and the spatial transformation features obtained from the spatial transformation of the previous layer to perform spatial transformation to obtain the spatial transformation of the current layer. Transform the features until the spatial transformation features of the last layer are obtained, and determine the spatial transformation features of the last layer as the state positioning features, and determine the image features of the state-related regions through the state positioning features and the convolution features output by the last layer of convolution .

基于上述任一实施例，空间变换子模块具体用于：Based on any of the above embodiments, the spatial transformation sub-module is specifically used for:

用于将当前层卷积输出的卷积特征与前一层空间变换所得的状态定位特征进行特征融合，得到当前层卷积对应的融合特征，并对当前层卷积对应的融合特征进行空间变换，得到当前层空间变换的状态定位特征。It is used to perform feature fusion between the convolution features output by the convolution of the current layer and the state positioning features obtained by the spatial transformation of the previous layer to obtain the fusion features corresponding to the convolution of the current layer, and perform spatial transformation on the fusion features corresponding to the convolution of the current layer. , get the state localization feature of the current layer space transformation.

基于上述任一实施例，状态分类子模块包括：Based on any of the above embodiments, the state classification submodule includes:

均衡特征提取子模块，用于基于分类网络中的光照感知网络，对状态相关区域的图像特征进行光照强度均衡，得到状态相关区域的均衡特征；The balanced feature extraction sub-module is used to balance the light intensity of the image features of the state-related regions based on the light perception network in the classification network, and obtain the balanced features of the state-related regions;

状态检测子模块，用于基于分类网络中的状态分类网络，应用状态相关区域的均衡特征对待检测图像进行人员状态检测。The state detection sub-module is used to perform personnel state detection on the image to be detected by applying the balanced features of the state-related regions based on the state classification network in the classification network.

基于上述任一实施例，均衡特征提取子模块包括：Based on any of the above embodiments, the balanced feature extraction sub-module includes:

光照特征提取模块，用于基于光照感知网络中的强光感知网络分支和弱光感知网络分支，分别对状态相关区域的图像特征进行光照特征提取，得到状态相关区域的强光特征和状态相关区域的弱光特征；The illumination feature extraction module is used to extract illumination features from the image features of the state-related regions based on the strong-light-sensing network branch and the weak-light-sensing network branch in the illumination sensing network, and obtain the strong light features and state-related regions of the state-related regions. low-light characteristics;

权重融合子模块，用于基于光照感知网络中权重融合分支，对状态相关区域的图像特征进行预测，得到光照强度权重值，并基于光照强度权重值，对强光感知特征和弱光感知特征进行加权，得到状态相关区域的均衡特征。The weight fusion sub-module is used to predict the image features of the state-related area based on the weight fusion branch of the light perception network, and obtain the light intensity weight value, and based on the light intensity weight value, the strong light perception feature and the low light perception feature are processed. Weighted to obtain balanced features of state-related regions.

基于上述任一实施例，状态检测装置，还包括：训练模块，该训练模块包括：Based on any of the above embodiments, the state detection apparatus further includes: a training module, the training module includes:

构建初始模型子模块，用于确定初始检测模型；初始检测模型包括初始状态定位网络、初始光线感知网络和初始状态分类网络；Build the initial model sub-module to determine the initial detection model; the initial detection model includes the initial state positioning network, the initial light perception network and the initial state classification network;

第一阶段训练子模块，用于基于样本图像以及样本图像的人员状态类别标签对初始检测模型进行训练，得到状态定位网络和光线感知网络；The first-stage training sub-module is used to train the initial detection model based on the sample image and the person state category label of the sample image, and obtain the state localization network and light perception network;

构建中间模型子模块，用于确定中间检测模型；中间检测模型包括状态定位网络和光线感知网络，以及初始状态分类网络；Build an intermediate model sub-module to determine the intermediate detection model; the intermediate detection model includes a state localization network, a light perception network, and an initial state classification network;

第二阶段训练子模块，用于固定状态定位网络和光线感知网络的参数，基于样本图像以及样本图像的人员状态类别标签对中间检测模型进行训练，得到状态分类网络。The second stage training sub-module is used to fix the parameters of the state positioning network and the light perception network, and train the intermediate detection model based on the sample image and the person state category label of the sample image, and obtain the state classification network.

基于上述任一实施例，第二阶段训练子模块包括：Based on any of the above embodiments, the second-stage training submodule includes:

状态类别权重计算子模块，用于基于各人员状态类别下的样本图像的样本数据量，确定各人员状态类别的权重；The status category weight calculation sub-module is used to determine the weight of each personnel status category based on the sample data amount of the sample images under each personnel status category;

状态分类网络训练子模块，用于基于中间检测模型基于样本图像输出的状态分类结果、样本图像的人员状态类别标签和各人员状态类别的权重进行损失计算，并基于损失计算结果更新初始分类网络的参数，直至损失计算结果收敛，得到状态分类网络。The state classification network training sub-module is used to calculate the loss based on the state classification result output by the sample image, the personnel state category label of the sample image and the weight of each personnel state category based on the intermediate detection model, and update the initial classification network based on the loss calculation result. parameters until the loss calculation result converges, and the state classification network is obtained.

图12示例了一种电子设备的实体结构示意图，如图12所示，该电子设备可以包括：处理器(processor)1210、通信接口(Communications Interface)1220、存储器(memory)1230和通信总线1240，其中，处理器1210，通信接口1220，存储器1230通过通信总线1240完成相互间的通信。处理器1210可以调用存储器1230中的逻辑指令，以执行状态检测方法，该方法包括：确定待检测图像；基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并通过状态相关区域对待检测图像进行人员状态检测；状态检测模型是基于样本图像和样本图像的人员状态类别标签训练得到的。FIG. 12 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 12 , the electronic device may include: a processor (processor) 1210, a communication interface (Communications Interface) 1220, a memory (memory) 1230 and acommunication bus 1240, Theprocessor 1210 , thecommunication interface 1220 , and thememory 1230 communicate with each other through thecommunication bus 1240 . Theprocessor 1210 can call the logic instructions in thememory 1230 to execute a state detection method, the method includes: determining an image to be detected; Person state detection is performed on the image to be detected in the relevant area; the state detection model is trained based on the sample image and the person state category label of the sample image.

此外，上述的存储器1230中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。In addition, the above-mentioned logic instructions in thememory 1230 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes .

另一方面，本发明还提供一种计算机程序产品，所述计算机程序产品包括计算机程序，计算机程序可存储在非暂态计算机可读存储介质上，所述计算机程序被处理器执行时，计算机能够执行上述各方法所提供的状态检测方法，该方法包括：确定待检测图像；基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并通过状态相关区域对待检测图像进行人员状态检测；状态检测模型是基于样本图像和样本图像的人员状态类别标签训练得到的。In another aspect, the present invention also provides a computer program product, the computer program product includes a computer program, the computer program can be stored on a non-transitory computer-readable storage medium, and when the computer program is executed by a processor, the computer can Execute the state detection method provided by the above methods, the method includes: determining an image to be detected; based on a state detection model, adaptively locating a state-related area in the image to be detected through spatial transformation, and performing personnel inspection on the image to be detected through the state-related area. Status detection; the status detection model is trained based on the sample images and the person status category labels of the sample images.

又一方面，本发明还提供一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现以执行上述各方法提供的状态检测方法，该方法包括：确定待检测图像；基于状态检测模型，通过空间变换自适应定位待检测图像中的状态相关区域，并通过状态相关区域对待检测图像进行状态检测；状态检测模型是基于样本图像和样本图像的人员状态类别标签训练得到的。In another aspect, the present invention also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, is implemented to execute the state detection method provided by the above methods, and the method includes: Determine the image to be detected; based on the state detection model, adaptively locate the state-related area in the image to be detected through spatial transformation, and perform state detection on the image to be detected through the state-related area; the state detection model is based on the sample image and the personnel state of the sample image Class labels are obtained by training.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

Translated fromChinese

1.一种状态检测方法，其特征在于，包括：1. a state detection method, is characterized in that, comprises:

确定待检测图像；Determine the image to be detected;

2.根据权利要求1所述的状态检测方法，其特征在于，所述基于状态检测模型，通过空间变换自适应定位所述待检测图像中的状态相关区域，并通过所述状态相关区域对所述待检测图像进行人员状态检测，包括：2 . The state detection method according to claim 1 , wherein, based on the state detection model, the state-related region in the to-be-detected image is adaptively located through spatial transformation, and the state-related region is used to detect the state-related region. 3 . Perform personnel status detection on the image to be detected, including:

3.根据权利要求2所述的状态检测方法，其特征在于，所述基于所述状态检测模型中的状态定位网络，通过空间变换自适应定位所述待检测图像中的，得到状态定位特征，并通过所述状态定位特征和所述待检测图像的卷积特征确定所述状态相关区域的图像特征，包括：3. The state detection method according to claim 2, characterized in that, based on the state location network in the state detection model, the state location feature is obtained by adaptively locating in the image to be detected by spatial transformation, And determine the image features of the state-related region through the state positioning feature and the convolution feature of the to-be-detected image, including:

基于所述状态定位网络中的多层卷积网络，对所述待检测图像进行特征提取，得到所述多状态相关区域层卷积网络中每层卷积输出的卷积特征；Based on the multi-layer convolution network in the state localization network, feature extraction is performed on the to-be-detected image to obtain convolution features output by each layer of convolution in the multi-state correlation region layer convolution network;

4.根据权利要求3所述的状态检测方法，其特征在于，所述应用当前层卷积输出的卷积特征与前一层空间变换所得的空间变换特征，进行空间变换，得到当前层空间变换的空间变换特征，包括：4. state detection method according to claim 3, is characterized in that, described applying the convolution feature of current layer convolution output and the space transformation feature obtained by previous layer space transformation, carry out space transformation, obtain current layer space transformation The spatial transformation features of , including:

5.根据权利要求2所述的状态检测方法，其特征在于，所述基于所述状态检测模型中的分类网络，应用所述状态相关区域的图像特征对所述待检测图像进行人员状态检测，包括：5 . The state detection method according to claim 2 , wherein the state detection method based on the classification network in the state detection model applies the image features of the state-related region to perform personnel state detection on the image to be detected, 6 . include:

6.根据权利要求5所述的状态检测方法，其特征在于，所述基于所述分类网络中的光照感知网络，对所述状态相关区域的图像特征进行光照强度均衡，得到所述状态相关区域的均衡特征，包括：6 . The state detection method according to claim 5 , wherein the state-related region is obtained by performing illumination intensity equalization on the image features of the state-related region based on the light-sensing network in the classification network. 7 . Equilibrium characteristics of , including:

基于所述光照感知网络中权重融合分支，对所述状态相关区域的图像特征进行预测，得到光照强度权重值，并基于所述光照强度权重值，对所述强光感知特征和所述弱光感知特征进行加权，得到所述状态相关区域的均衡特征。Based on the weight fusion branch in the light perception network, the image features of the state-related area are predicted to obtain a light intensity weight value, and based on the light intensity weight value, the strong light perception feature and the weak light The perceptual features are weighted to obtain balanced features of the state-related regions.

7.根据权利要求5所述的状态检测方法，其特征在于，所述状态检测模型基于如下步骤训练得到：7. state detection method according to claim 5, is characterized in that, described state detection model is obtained based on following steps training:

8.根据权利要求7所述的状态检测方法，其特征在于，所述固定所述状态定位网络和所述光线感知网络的参数，基于所述样本图像以及所述样本图像的人员状态类别标签对所述中间检测模型进行训练，得到所述状态分类网络，包括：8 . The state detection method according to claim 7 , wherein the parameters of the state positioning network and the light perception network are fixed, based on the sample image and the pair of personnel state category labels of the sample image. 9 . The intermediate detection model is trained to obtain the state classification network, including:

9.一种状态检测装置，其特征在于，包括：9. A state detection device, characterized in that, comprising:

10.一种电子设备，包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，其特征在于，所述处理器执行所述程序时实现如权利要求1至8任一项所述状态检测方法的步骤。10. An electronic device, comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the program as claimed in claim 1 when the processor executes the program Steps of any one of the state detection methods described in to 8.

11.一种非暂态计算机可读存储介质，其上存储有计算机程序，其特征在于，所述计算机程序被处理器执行时实现如权利要求1至8任一项所述状态检测方法的步骤。11. A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program implements the steps of the state detection method according to any one of claims 1 to 8 when the computer program is executed by a processor .