Movatterモバイル変換


[0]ホーム

URL:


CN116824491B - Visibility detection method, detection model training method, device and storage medium - Google Patents

Visibility detection method, detection model training method, device and storage medium
Download PDF

Info

Publication number
CN116824491B
CN116824491BCN202310723634.9ACN202310723634ACN116824491BCN 116824491 BCN116824491 BCN 116824491BCN 202310723634 ACN202310723634 ACN 202310723634ACN 116824491 BCN116824491 BCN 116824491B
Authority
CN
China
Prior art keywords
image
visibility
detected
detection model
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310723634.9A
Other languages
Chinese (zh)
Other versions
CN116824491A (en
Inventor
杜雨亭
姬东飞
陆勤
龚建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co LtdfiledCriticalBeijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310723634.9ApriorityCriticalpatent/CN116824491B/en
Publication of CN116824491ApublicationCriticalpatent/CN116824491A/en
Application grantedgrantedCritical
Publication of CN116824491BpublicationCriticalpatent/CN116824491B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本公开提供了能见度检测方法、检测模型的训练方法、装置及存储介质,涉及人工智能领域,具体涉及图像识别、视频分析技术,可应用在智慧城市、城市治理、应急管理场景下。具体实现方案为:从监控视频中提取与预设的第一目标区域对应的第一图像;在第一图像对应的第一描述文本符合预设规则的情况下,将第一图像确定为待检测图像;利用第一检测模型,根据待检测图像和第一描述文本,得到待检测图像的可见性标签;其中,可见性标签用于描述待检测图像中的目标对象为可见状态或不可见状态;以及根据待检测图像的可见性标签,确定监控视频对应的监控区域的能见度等级。本公开可以实现快速高效地确定能见度等级。

The present disclosure provides a visibility detection method, a detection model training method, a device and a storage medium, which relate to the field of artificial intelligence, specifically to image recognition and video analysis technology, and can be applied in smart cities, urban governance, and emergency management scenarios. The specific implementation scheme is: extracting a first image corresponding to a preset first target area from a surveillance video; when a first description text corresponding to the first image meets a preset rule, determining the first image as an image to be detected; using a first detection model, according to the image to be detected and the first description text, obtaining a visibility label of the image to be detected; wherein the visibility label is used to describe whether the target object in the image to be detected is visible or invisible; and determining the visibility level of the surveillance area corresponding to the surveillance video according to the visibility label of the image to be detected. The present disclosure can realize the rapid and efficient determination of the visibility level.

Description

Visibility detection method, training method and device of detection model and storage medium
Technical Field
The disclosure relates to the field of artificial intelligence, in particular to an image recognition and video analysis technology, which can be applied to smart cities, urban management and emergency management scenes.
Background
Based on computer vision techniques, visibility can be inferred by analyzing images captured by a camera. The method can estimate the visibility by utilizing the contrast, the ambiguity and other characteristics in the image, so that the method can replace a professional visibility measuring instrument in some use scenes.
Disclosure of Invention
The disclosure provides a visibility detection method, a training device and a training device for a detection model, and a storage medium.
According to an aspect of the present disclosure, there is provided a visibility detection method including:
Extracting a first image corresponding to a preset first target area from a monitoring video;
under the condition that a first description text corresponding to the first image accords with a preset rule, determining the first image as an image to be detected;
Obtaining a visibility tag of the image to be detected according to the image to be detected and a first description text by using a first detection model, wherein the visibility tag is used for describing whether a target object in the image to be detected is in a visible state or an invisible state, and
And determining the visibility level of a monitoring area corresponding to the monitoring video according to the visibility label of the image to be detected, wherein the first target area is a part of the monitoring area.
According to another aspect of the present disclosure, there is provided a training method of a detection model, including:
generating a second descriptive text according to the sample image;
Obtaining a predicted value of a visibility tag of the sample image according to the sample image and the second description text by using a second detection model, wherein the visibility tag is used for describing whether a target object in the sample image is in a visible state or an invisible state, and
And carrying out parameter optimization on the second detection model according to the difference between the predicted value and the actual value of the visibility label of the sample image so as to train and obtain the first detection model.
According to another aspect of the present disclosure, there is provided a visibility detecting device including:
The extraction module is used for extracting a first image corresponding to a preset first target area from the monitoring video;
The image determining module is used for determining the first image as an image to be detected under the condition that a first description text corresponding to the first image accords with a preset rule;
The detection module is used for obtaining a visibility tag of the image to be detected according to the image to be detected and the first description text by using the first detection model, wherein the visibility tag is used for describing whether a target object in the image to be detected is in a visible state or an invisible state, and
The system comprises a grade determining module, a grade detecting module and a grade detecting module, wherein the grade determining module is used for determining the visibility grade of a monitoring area corresponding to a monitoring video according to the visibility label of an image to be detected, and the first target area is a part of the monitoring area.
According to another aspect of the present disclosure, there is provided a training apparatus for a detection model, including:
The generation module is used for generating a second description text according to the sample image;
The prediction module is used for obtaining a predicted value of a visibility label of the sample image according to the sample image and the second description text by using the second detection model, wherein the visibility label is used for describing whether a target object in the sample image is in a visible state or an invisible state, and
And the training module is used for carrying out parameter optimization on the second detection model according to the difference between the predicted value and the true value of the visibility label of the sample image so as to train to obtain the first detection model.
According to another aspect of the present disclosure, there is provided an electronic device including:
At least one processor, and
A memory communicatively coupled to the at least one processor, wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to any one of the embodiments of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to any of the embodiments of the present disclosure.
The present disclosure may enable a quick and efficient determination of a visibility level.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of a visibility detection method according to an embodiment of the present disclosure;
FIG. 2 is a flow chart diagram of a visibility detection method according to another embodiment of the present disclosure;
FIG. 3 is a flow diagram of a test model training method according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of a visibility detection device according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a test model training apparatus according to an embodiment of the present disclosure;
Fig. 6 is a block diagram of an electronic device used to implement an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In the related art, the accuracy requirements on an image processing algorithm and a model are high based on the detection visibility of a computer vision technology. Such image processing methods may have recognition and measurement errors in complex scenarios. In order to improve the accuracy and the robustness of detection, the training of an image processing algorithm and a model is complicated.
In order to at least partially solve one or more of the above-mentioned problems and other potential problems, embodiments of the present disclosure propose a visibility detection method, with which a visibility level can be quickly derived based on a smaller calculation amount.
Fig. 1 is a flow chart of a visibility detection method according to an embodiment of the present disclosure. As shown in fig. 1, the method at least comprises the following steps:
s101, extracting a first image corresponding to a preset first target area from a monitoring video.
In the embodiment of the disclosure, the monitoring video can be acquired based on the camera device in the scenes of traffic, municipal administration, security protection and the like. For example, the imaging device on the road side or the intersection of roads such as urban roads, expressways, national roads, and provincial roads may be any imaging device installed outdoors and capable of capturing weather. In general, a monitoring area where a fixedly installed image pickup device can collect and record is fixed, and a photographing direction and a viewing angle of a part of the image pickup device can be adjusted, but even then, the monitoring area where the image pickup device can photograph is fixed.
The first target area is a part of the monitoring area, i.e. a subset of the monitoring area. Specifically, the first target area refers to a part of a monitoring screen/image obtained by the image capturing device capturing the monitoring area. The first target area can be selected and drawn in the monitoring picture according to the monitored view angle and the condition of the actual environment. The position of the first target area in the monitor screen is not limited, and in order to facilitate image processing, the first target area needs to have a certain area, for example, needs to be greater than 50×50 pixels.
After the first target area is selected, continuously extracting a first image corresponding to the first target area from one or more frames of images of the monitoring video. For an imaging device capable of adjusting the shooting direction and the viewing angle, when the first target area is selected, the shooting angle information of the imaging device needs to be recorded, so that when the first target area appears in the current shooting range, the first image is extracted.
S102, determining the first image as an image to be detected under the condition that a first description text corresponding to the first image accords with a preset rule.
The first descriptive text may be understood as text describing the image content of the first image. The method can be realized by adopting the existing image annotation model, natural language generation model and the like. Taking a natural language generation model as an example, a first image is input, and a first descriptive text is output from the model.
Under the condition of good visibility, the natural language generation model can normally and accurately generate descriptions of image contents, for example, a picture of blue sky and trees can be seen, a speed-limiting guideboard can be seen in the picture, and a building is arranged in the picture. In the case of poor visibility, the contents of the graph may be blurred, and the descriptive text may not accurately describe the contents of the graph, for example, "the trees in the graph are blurred", "roads can be hidden in the graph", and "the graph on the upper portion of the building cannot be seen". In the case of poor visibility, the content in the graph may be completely blocked by fog and haze, and the description text may be "the graph with lower visibility" and "the white fog in the graph". According to the rules, the first image with poor visibility can be screened out through preset rules. For example, regular matching is performed based on some keywords, and all first images related to descriptions such as blurring, unclear, invisible, low in visibility, fog, haze and the like are screened out to obtain an image to be detected. Otherwise, the first image with good visibility can be screened out according to the first description text.
The natural language generation model can be trained through high-quality labeling samples, so that characteristics contained in generated text information are more accurate. For example, "this is a drawing in which clear buildings and plants can be seen" or "this is a drawing in which no building edges can be seen", thereby adding information of picture descriptions.
S103, obtaining a visibility label of the image to be detected according to the image to be detected and the first description text by using the first detection model. The visibility tag is used for describing whether a target object in an image to be detected is in a visible state or an invisible state.
The first detection model is a text and image based multimodal detection model that typically comprises two main parts, a text processing module and an image processing module. The text processing module is responsible for processing text information. It receives an input text description and may use natural language processing techniques such as word embedding, recurrent neural networks, long-term memory, etc., to convert the text information into feature vectors or text representations. These representations can capture semantic and contextual information in text. The image processing module is responsible for processing the image information. It receives an input image and uses computer vision techniques for image feature extraction. A common approach involves using a pre-trained convolutional neural network as a feature extractor. By inputting the image into the CNN, a high-level feature representation of the image can be obtained.
The text processing module and the image processing module are typically followed by a fusion operation to combine the text and image information. The fusion may be by simple stitching, weighted summation, attention mechanism, etc. to integrate the information of the two modalities.
Finally, the fused features are further used to determine a visibility tag of the image to be detected. The visibility tag may be a classification tag for distinguishing between a visible state and an invisible state of a target object in an image to be detected. The target object may be any visible object such as a road, street lamp, lane line, house, plant, etc.
S104, determining the visibility level of the monitoring area corresponding to the monitoring video according to the visibility label of the image to be detected. Wherein the first target area is part of the monitoring area.
When the first target area is preset, it may be corresponding to one visibility level. In one example, the first target area a is disposed in a position closer to the image capturing apparatus in the monitored area, and then the first target area a is associated with a low visibility level (a level of 1 visibility may be used, for example), the first target area B is disposed in an intermediate position in the monitored area, and then the first target area B is associated with a medium visibility level (a level of 3 visibility may be used), and the first target area C is disposed in a position farther in the monitored area, and then the first target area C is associated with a high visibility level (a level of 5 visibility may be used). It is assumed that the first descriptive text corresponding to the first image of the first target area B is used as the image to be detected due to the fact that the first descriptive text accords with a preset rule. And when the visibility label is in an invisible state after detection of the first detection model, the visibility in the monitoring video is insufficient to see the first target area B. Then, the visibility level of the monitored area needs to be reduced to a medium visibility level corresponding to the first target area B. Similarly, assuming that the first image of the first target area C is used as the image to be detected, when the visibility label is in the visible state after detection of the first detection model, it is indicated that the visibility in the monitoring video can see the first target area C clearly, and then the visibility level of the monitoring area needs to be raised to a high visibility level.
It will be appreciated that the visibility level corresponding to the first target area may be determined based on the approximate distance of that area from the camera. Or can be determined according to the accurate distance between the target object contained in the region and the image pickup device.
According to the scheme of the embodiment of the disclosure, the image to be detected is screened out through the description text of the image, and whether the image content of the image to be detected is clearly visible or not is judged based on text information and image information, so that the quick classification of the visibility is realized.
In one possible implementation manner, step S101 extracts a first image corresponding to a preset first target area from the surveillance video, and further includes the steps of:
And determining a first target area according to a first marker preset in a monitoring area corresponding to the monitoring video.
And extracting a first image corresponding to the first target area from the monitoring video.
In this embodiment of the disclosure, the first target area may be set according to a fixed marker or a marker object, for example, a street lamp, a sign, a building, or the like, or a mountain, sky, or the like, and the type of the marker object is not limited. A marker is understood to be an object that facilitates differentiation from an image background, such as a large tree in the ground, a test device in the ground on a highway pavement, a signal light in the building. In general, the first target area is a rectangular area containing a marker, and after the marker is selected, a rectangular area containing the marker is drawn on the monitored video picture to obtain the first target area. After the first target area is determined, the coordinate position of the area is saved. According to the coordinate position of the area, a first image can be extracted from the video stream of the monitoring video at a predetermined frequency. It will be appreciated that the presence of the markers may facilitate determining whether the descriptive text meets a preset rule, e.g., the preset rule may be set based on whether a description for the markers is present in the descriptive text. On the other hand, the accuracy of the first detection model in determining the visibility labels can also be improved.
According to the scheme of the embodiment of the disclosure, the first target area is set according to the marker, so that whether the image in the image to be detected is visible or not can be judged more easily, and the accuracy of the output result of the first detection model can be improved.
In a possible implementation manner, step S104 determines, according to the visibility label of the image to be detected, a visibility level of a monitoring area corresponding to the monitoring video, and further includes the steps of:
s1041, determining a preset visibility level of a first marker corresponding to the image to be detected. The preset visibility level is obtained according to the distance between the first marker and the monitoring device for recording the monitoring video.
S1042, determining the latest visibility level of the monitoring area corresponding to the monitoring video.
S1043, determining the visibility level of the monitoring area corresponding to the monitoring video according to the visibility label of the image to be detected, the preset visibility level and the latest visibility level.
In the embodiment of the present disclosure, a plurality of visibility levels may be divided in advance, for example:
1. The visibility view range is very good at 20-30 km;
2. The visibility view range is 15-25 km, which is good in visibility;
3. the visibility view range is better in 10-20 km;
4. the visibility viewing range is 5-15 km, which is common in visibility;
5. The visibility view range is light fog with 1-10 km, and the visibility is poor;
6. visibility vision is large fog and poor in visibility within 0.3-1 km;
7. visibility vision less than 0.3 km is heavy fog and poor in visibility;
8. visibility vision is less than 0.1 km as thick fog and extremely bad.
And determining a preset visibility level of the image to be detected from the pre-divided visibility levels according to the distance between the first marker contained in the first target area corresponding to the image to be detected and the monitoring device for recording the monitoring video.
The latest visibility level may be understood as the last historically detected visibility level. Because the output result of the first detection model is the visibility label of the image to be detected and is not an accurate visibility value, the visibility level of the detection needs to be obtained by comprehensively judging the latest visibility level, the visibility label obtained by the image to be detected and the preset visibility level of the image to be detected.
According to the scheme of the embodiment of the disclosure, the visibility level can be determined through the visibility label of the image to be detected and the corresponding preset visibility level without accurately measuring or calculating the visibility value.
In a possible implementation manner, step S1043 obtains a visibility level of a monitoring area corresponding to the monitoring video according to the visibility tag of the image to be detected, the preset visibility level and the latest visibility level, and further includes:
and under the condition that the visibility label of the image to be detected is in an invisible state, determining the visibility level of the monitoring area according to the lower one of the preset visibility level and the latest visibility level.
In the embodiment of the disclosure, if the visibility label of the image to be detected is in an invisible state, the visibility is reduced to or below a preset visibility level corresponding to the image to be detected. At this time, if the latest visibility level is greater than the preset visibility level, the adjustment should be performed, and if the latest visibility level is already less than the preset visibility level, the adjustment is not performed.
According to the scheme of the embodiment of the disclosure, when any image to be detected is judged to be in an invisible state, the visibility level can be quickly reduced to the preset visibility level of the image to be detected.
In a possible implementation manner, step S1043 obtains a visibility level of a monitoring area corresponding to the monitoring video according to the visibility tag of the image to be detected, the preset visibility level and the latest visibility level, and further includes:
And under the condition that the visibility label of the image to be detected is in a visible state, determining the visibility level of the monitoring area according to the higher one of the preset visibility level and the latest visibility level.
In the embodiment of the disclosure, if the visibility label of the image to be detected is in a visible state, the visibility is indicated to reach or be higher than the preset visibility level corresponding to the image to be detected. At this time, if the latest visibility level is smaller than the preset visibility level, the adjustment should be performed, and if the latest visibility level is already larger than the preset visibility level, the adjustment is not performed.
According to the scheme of the embodiment of the disclosure, when any image to be detected is judged to be in a visible state, the visibility level can be quickly adjusted to the preset visibility level of the image to be detected.
In one possible implementation, the first markers in the monitoring area are multiple, and the multiple first markers in the monitoring area correspond to multiple different visibility levels.
In the embodiment of the disclosure, a plurality of first target areas can be set in one monitoring area according to a plurality of first markers, so that the monitoring area can be subjected to finer visibility grading. For example, in a highway scene, the farthest mountain in the monitoring area can be classified into a grade with excellent visibility, the far house as a first marker B is classified into a grade with excellent visibility, a big tree beside a highway is classified into a grade with general visibility, a speed limit plate at one side of the highway is classified into a grade with poor visibility, a testing device at the near is classified into a grade with poor visibility, and a testing device at the near is classified into a first marker E. Then, assuming that as the visibility of the monitored area gradually decreases, the first markers a, B, C, D are sequentially invisible in their corresponding first images, and the first marker E is still visible in its corresponding first image, the visibility level of the monitored area is poor.
According to the scheme of the embodiment of the disclosure, according to the first markers with different visibility levels in the monitoring area, the visibility levels can be classified, so that the detection result is finer.
In one possible implementation manner, step S102 determines the first image as the image to be detected if the first description text corresponding to the first image meets the preset rule, and further includes the steps of:
And obtaining a first descriptive text according to the image content contained in the first image.
And under the condition that the first descriptive text accords with a preset rule, determining the first image as the image to be detected.
In the embodiment of the disclosure, the natural language generation model can generate descriptive text for pictures according to the pictures. Such tasks are commonly referred to as image description generation (Image Captioning), in which a model receives an input image and generates a natural language description corresponding to the image content. The model is able to understand visual features in the image and convert them into natural language text. Typically, these models use a deep learning architecture, such as convolutional neural network (Convolutional Neural Networks, CNN) as the feature extractor for the image, and recurrent neural network (Recurrent Neural Networks, RNN) as the text generator.
By training a large-scale image-text dataset, the natural language generation model can learn semantic associations between images and text and can generate descriptive text related to the input image content. This enables the model to describe scenes, objects, and relationships in the image in a linguistic manner, thereby providing a richer and more accurate description of the image.
The preset rule can be adjusted according to the detection task, for example, when the weather forecast issues the fog warning, the first image with poor visibility can be screened out through the preset rule. Therefore, only the first image with poor visibility is detected, the level to which the visibility is reduced is rapidly detected, and the calculated amount of detection tasks is reduced. If the big fog occurs, judging whether the big fog dissipates or not, screening out a first image with good visibility through a preset rule, and detecting to which level the visibility is improved.
According to the scheme of the embodiment of the disclosure, the first images to be detected can be screened out through the description text, so that the number of the images to be detected is reduced, the calculated amount is reduced, and the calculation resources are saved.
In a possible implementation manner, step S103 obtains, by using the first detection model, a visibility tag of the image to be detected according to the image to be detected and a first description text corresponding to the image to be detected, and further includes:
And inputting a first description text corresponding to the image to be detected into a first detection model to obtain a first text feature.
And inputting the image information of the image to be detected into a first detection model to obtain a first image feature.
And obtaining the similarity between the first text feature and the first image feature by using the first detection model.
And obtaining the visibility label of the image to be detected according to the similarity and a preset threshold value.
In embodiments of the present disclosure, the first detection model may determine a similarity, such as a cosine similarity, between the first text feature and the first image feature. If the input image to be detected is the first image with low visibility, and if the similarity is higher, the image features indicate that the visibility is low, and the image to be detected can be obtained to belong to the category with lower visibility. If the similarity is lower, the image characteristics of the image to be detected indicate that the visibility is not low, and the image characteristics possibly belong to the category with higher visibility. The similarity result may be mapped to the visibility tag by a preset threshold. For example, if the similarity is greater than 0.8, the invisible state is determined.
According to the scheme of the embodiment of the disclosure, the multi-mode detection is performed based on the text information and the image information, so that the accuracy of the detection result is improved.
Fig. 2 is a flow chart of a visibility detection method according to an embodiment of the present disclosure. In one possible implementation, as shown in fig. 2, the method at least includes the following steps:
step one, accessing a monitoring video.
And secondly, drawing a visibility region on the monitoring video to obtain a plurality of first target regions, wherein the first target regions correspond to a plurality of visibility levels.
And thirdly, inputting the first area (the first target area), the second area and the Nth area into a classification network (a first detection model).
And step four, determining the visibility level according to the labels output by the classification network and whether the areas are visible. And each visibility level has a corresponding visibility range, and when the detected visibility range is smaller than the lowest warning range, warning is carried out.
It should be noted that, under the condition that the visibility meter is installed in the monitoring area, the detection result obtained by the visibility meter can be compared with the detection result output by the first detection model, and the visibility range is comprehensively obtained based on the confidence level of the detection result and the detection result, so that the detection result is obtained more accurately.
Fig. 3 is a flowchart of a training method of a detection model according to an embodiment of the disclosure. The method at least comprises the following steps:
s301, generating a second description text according to the sample image.
S302, obtaining a predicted value of the visibility label of the sample image according to the sample image and the second description text by using the second detection model. The visibility tag is used for describing whether a target object in the sample image is in a visible state or an invisible state.
S303, performing parameter optimization on the second detection model according to the difference between the predicted value and the actual value of the visibility label of the sample image so as to train to obtain a first detection model.
In the embodiment of the disclosure, the method for generating the second description text according to the sample image is the same as the method for generating the first description text according to the first image. The visibility information of the sample image can be acquired at a monitoring point position provided with a visibility meter, and the accurate visibility information obtained by measurement according to the visibility meter is associated with the sample image acquired by the camera device, so that the visibility information of the sample image is obtained.
The second detection model can be an initial detection model or can be obtained by training the initial detection model for a plurality of times. And when the model converges or reaches the preset training times, obtaining a first detection model. The first detection model may be used to perform the visibility detection method described in any of the embodiments above.
According to the scheme of the embodiment of the disclosure, through the input of the multi-mode information, the accuracy of the detection result of the detection model can be improved.
In one possible implementation manner, step S301 generates a second description text according to the sample image, and further includes:
And determining a second target area according to a second marker preset in the monitoring area corresponding to the monitoring video.
And extracting a sample image corresponding to the second target area from the monitoring video.
And obtaining a second descriptive text according to the image content contained in the sample image.
In a possible implementation manner, step S302 obtains, using the second detection model, a predicted value of the visibility tag of the sample image according to the sample image and the second descriptive text, and further includes:
And inputting a second descriptive text corresponding to the sample image into a second detection model to obtain a second text feature.
And inputting the image information of the sample image into a second detection model to obtain a second image feature.
And obtaining the similarity between the second text feature and the second image feature by using the second detection model.
And obtaining a predicted value of the visibility label of the sample image according to the similarity and a preset threshold value.
Fig. 4 is a schematic structural diagram of a visibility detecting device according to an embodiment of the present disclosure. As shown in fig. 4, the apparatus includes at least:
The extracting module 401 is configured to extract a first image corresponding to a preset first target area from the surveillance video.
The image determining module 402 is configured to determine the first image as the image to be detected if the first description text corresponding to the first image meets a preset rule.
The detection module 403 is configured to obtain a visibility tag of the image to be detected according to the image to be detected and the first description text by using the first detection model. The visibility tag is used for describing whether a target object in an image to be detected is in a visible state or an invisible state. And
The level determining module 404 is configured to determine, according to the visibility label of the image to be detected, a visibility level of a monitoring area corresponding to the monitoring video. Wherein the first target area is part of the monitoring area.
In one possible implementation, the extraction module 401 is configured to:
And determining a first target area according to a first marker preset in a monitoring area corresponding to the monitoring video.
And extracting a first image corresponding to the first target area from the monitoring video.
In one possible implementation, the rank determination module 404 includes:
the first determining submodule is used for determining a preset visibility level of a first marker corresponding to the image to be detected. The preset visibility level is obtained according to the distance between the first marker and the monitoring device for recording the monitoring video.
And the second determining submodule is used for determining the latest visibility level of the monitoring area corresponding to the monitoring video.
And the third determining submodule is used for determining the visibility level of the monitoring area corresponding to the monitoring video according to the visibility label of the image to be detected, the preset visibility level and the latest visibility level.
In one possible implementation, the third determining submodule is configured to:
and under the condition that the visibility label of the image to be detected is in an invisible state, determining the visibility level of the monitoring area according to the lower one of the preset visibility level and the latest visibility level.
In one possible implementation, the third determining submodule is configured to:
And under the condition that the visibility label of the image to be detected is in a visible state, determining the visibility level of the monitoring area according to the higher one of the preset visibility level and the latest visibility level.
In one possible implementation, the first markers in the monitoring area are multiple, and the multiple first markers in the monitoring area correspond to multiple different visibility levels.
In one possible implementation, the image determination module 402 is configured to:
And obtaining a first descriptive text according to the image content contained in the first image.
And under the condition that the first descriptive text accords with a preset rule, determining the first image as the image to be detected.
In one possible implementation, the detection module 403 is configured to:
And inputting a first description text corresponding to the image to be detected into a first detection model to obtain a first text feature.
And inputting the image information of the image to be detected into a first detection model to obtain a first image feature.
And obtaining the similarity between the first text feature and the first image feature by using the first detection model.
And obtaining the visibility label of the image to be detected according to the similarity and a preset threshold value.
For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.
Fig. 5 is a schematic structural diagram of a training device for a detection model according to an embodiment of the disclosure. As shown in fig. 5, the apparatus includes at least:
a generating module 501, configured to generate a second description text according to the sample image.
And the prediction module 502 is configured to obtain, according to the sample image and the second description text, a predicted value of the visibility tag of the sample image by using the second detection model. The visibility tag is used for describing whether a target object in the sample image is in a visible state or an invisible state.
And the training module 503 is configured to perform parameter optimization on the second detection model according to a difference between the predicted value and the actual value of the visibility label of the sample image, so as to train to obtain the first detection model.
In one possible implementation, the generating module 501 is configured to:
And determining a second target area according to a second marker preset in the monitoring area corresponding to the monitoring video.
And extracting a sample image corresponding to the second target area from the monitoring video.
And obtaining a second descriptive text according to the image content contained in the sample image.
In one possible implementation, the prediction module 502 is configured to:
And inputting a second descriptive text corresponding to the sample image into a second detection model to obtain a second text feature.
And inputting the image information of the sample image into a second detection model to obtain a second image feature.
And obtaining the similarity between the second text feature and the second image feature by using the second detection model.
And obtaining a predicted value of the visibility label of the sample image according to the similarity and a preset threshold value.
For descriptions of specific functions and examples of each module and sub-module of the apparatus in the embodiments of the present disclosure, reference may be made to the related descriptions of corresponding steps in the foregoing method embodiments, which are not repeated herein.
In the technical scheme of the disclosure, the acquisition, storage, application and the like of the related user personal information all conform to the regulations of related laws and regulations, and the public sequence is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Various components in the device 600 are connected to the I/O interface 605, including an input unit 606, e.g., keyboard, mouse, etc., an output unit 607, e.g., various types of displays, speakers, etc., a storage unit 608, e.g., magnetic disk, optical disk, etc., and a communication unit 609, e.g., network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 601 performs the respective methods and processes described above, such as the visibility detection method, the training method of the detection model. For example, in some embodiments, the visibility detection method, the training method of the detection model, may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the visibility detection method, the training method of the detection model described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the visibility detection method, the training method of the detection model, in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the computer. Other types of devices may also be used to provide interaction with the user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), and the Internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed aspects are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions, improvements, etc. that are within the principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (19)

CN202310723634.9A2023-06-162023-06-16 Visibility detection method, detection model training method, device and storage mediumActiveCN116824491B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310723634.9ACN116824491B (en)2023-06-162023-06-16 Visibility detection method, detection model training method, device and storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310723634.9ACN116824491B (en)2023-06-162023-06-16 Visibility detection method, detection model training method, device and storage medium

Publications (2)

Publication NumberPublication Date
CN116824491A CN116824491A (en)2023-09-29
CN116824491Btrue CN116824491B (en)2025-04-18

Family

ID=88126925

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310723634.9AActiveCN116824491B (en)2023-06-162023-06-16 Visibility detection method, detection model training method, device and storage medium

Country Status (1)

CountryLink
CN (1)CN116824491B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119132092A (en)*2024-08-052024-12-13成都天海宸光科技有限公司 A high-speed emergency lane occupancy warning method based on multimodal large model region generation

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2023020005A1 (en)*2021-08-172023-02-23北京百度网讯科技有限公司Neural network model training method, image retrieval method, device, and medium
CN115761839A (en)*2022-10-212023-03-07北京百度网讯科技有限公司Training method of human face living body detection model, human face living body detection method and device
CN116091456A (en)*2023-01-312023-05-09安徽中科天达信息技术有限公司Road visibility detection method

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN111754474A (en)*2020-06-172020-10-09上海眼控科技股份有限公司 A method and device for visibility recognition based on image clarity
CN111967332B (en)*2020-07-202021-08-31禾多科技(北京)有限公司 Method and device for generating visibility information for automatic driving
CN113128581A (en)*2021-04-132021-07-16天津市滨海新区气象局(天津市滨海新区气象预警中心)Visibility detection method, device and system based on machine learning and storage medium
CN115880499B (en)*2023-02-222023-05-05北京猫猫狗狗科技有限公司Occluded target detection model training method, occluded target detection model training device, medium and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
WO2023020005A1 (en)*2021-08-172023-02-23北京百度网讯科技有限公司Neural network model training method, image retrieval method, device, and medium
CN115761839A (en)*2022-10-212023-03-07北京百度网讯科技有限公司Training method of human face living body detection model, human face living body detection method and device
CN116091456A (en)*2023-01-312023-05-09安徽中科天达信息技术有限公司Road visibility detection method

Also Published As

Publication numberPublication date
CN116824491A (en)2023-09-29

Similar Documents

PublicationPublication DateTitle
CN110472599B (en)Object quantity determination method and device, storage medium and electronic equipment
CN114612835A (en) A UAV target detection model based on YOLOv5 network
KR102507501B1 (en)Artificial Intelligence-based Water Quality Contaminant Monitoring System and Method
CN111798360A (en)Watermark detection method, watermark detection device, electronic equipment and storage medium
US20230245429A1 (en)Method and apparatus for training lane line detection model, electronic device and storage medium
CN112287983B (en) A remote sensing image target extraction system and method based on deep learning
US11709870B2 (en)Comprehensive utility line database and user interface for excavation sites
CN111881777B (en)Video processing method and device
CN113742440A (en)Road image data processing method and device, electronic equipment and cloud computing platform
CN117808708A (en)Cloud and fog remote sensing image processing method, device, equipment and medium
CN116824491B (en) Visibility detection method, detection model training method, device and storage medium
CN111444803A (en)Image processing method, image processing device, electronic equipment and storage medium
Ranyal et al.Automated pothole condition assessment in pavement using photogrammetry-assisted convolutional neural network
CN114119545A (en) A highway visibility estimation method, system, equipment and storage medium
Chavan et al.Billboard detection in the wild
CN112288701A (en)Intelligent traffic image detection method
CN115761655A (en)Target tracking method and device
CN113011298B (en)Truncated object sample generation, target detection method, road side equipment and cloud control platform
CN109934185B (en)Data processing method and device, medium and computing equipment
CN114780655B (en) Model training and map data processing method, device, equipment and storage medium
US20240233343A1 (en)Vector Map Verification
CN115359468A (en)Target website identification method, device, equipment and medium
Li et al.Context-aware and boundary-optimized model for road marking instance segmentation using MLS point cloud intensity images
CN111753625B (en)Pedestrian detection method, device, equipment and medium
Bai et al.Flood data analysis on SpaceNet 8 using Apache Sedona

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp