Movatterモバイル変換


[0]ホーム

URL:


CN116137071A - Method, device and equipment for detecting video recording and computer readable storage medium - Google Patents

Method, device and equipment for detecting video recording and computer readable storage medium
Download PDF

Info

Publication number
CN116137071A
CN116137071ACN202310142716.4ACN202310142716ACN116137071ACN 116137071 ACN116137071 ACN 116137071ACN 202310142716 ACN202310142716 ACN 202310142716ACN 116137071 ACN116137071 ACN 116137071A
Authority
CN
China
Prior art keywords
video
target
frame
initial
position information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310142716.4A
Other languages
Chinese (zh)
Inventor
李云龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Boe Smart Technology Co ltd
BOE Technology Group Co Ltd
Original Assignee
Chengdu Boe Smart Technology Co ltd
BOE Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Boe Smart Technology Co ltd, BOE Technology Group Co LtdfiledCriticalChengdu Boe Smart Technology Co ltd
Priority to CN202310142716.4ApriorityCriticalpatent/CN116137071A/en
Publication of CN116137071ApublicationCriticalpatent/CN116137071A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

The application discloses a method, a device and equipment for detecting a video recording and a computer readable storage medium, and belongs to the technical field of image processing. The method comprises the following steps: acquiring continuous multi-frame video pictures included in a video to be subjected to shooting detection; determining an initial video picture in multi-frame video pictures, wherein the initial video picture is a video picture of a shooting device with a target part and a target form appearing for the first time; determining a target cross-over ratio between a detection frame of a video recording device containing a target form and a detection frame containing a target part in an initial video picture; processing video pictures except the initial video picture in the multi-frame video pictures based on the fact that the target intersection ratio is larger than a reference threshold value, so as to obtain the number of times of target occurrence of the shooting equipment in the target form in the video; and determining whether the content of the video comprises the shooting behavior according to the occurrence times of the targets. The method improves the efficiency and accuracy of the video recording detection.

Description

Method, device and equipment for detecting video recording and computer readable storage medium
Technical Field
The embodiment of the application relates to the technical field of image processing, in particular to a method, a device, equipment and a computer readable storage medium for detecting a video.
Background
The shooting refers to shooting or video recording. In some scenes the user does not want to be recorded, and therefore a recording detection method is needed to determine whether recording behavior exists.
In the related art, whether the content of the video includes a recording behavior is determined by manually viewing the video. However, this approach relies on artificial subjective awareness, which makes the accuracy of the recording detection low. Moreover, the cost of the video recording detection is high and the efficiency is low by means of manual checking.
Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a computer readable storage medium for detecting a video, which can be used for solving the problems of lower accuracy, higher cost and lower efficiency of video detection in the related technology. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for detecting a video recording, where the method includes:
acquiring continuous multi-frame video pictures included in a video to be subjected to shooting detection;
determining an initial video picture in the multi-frame video pictures, wherein the initial video picture is a video picture of a shooting device with a target part and a target form appearing for the first time;
determining a target cross-over ratio between a detection frame of the photographing device containing the target form and a detection frame containing the target part in the initial video picture;
Processing video pictures except the initial video picture in the multi-frame video pictures based on the target intersection ratio being larger than a reference threshold value to obtain the number of times of target occurrence of the shooting equipment in the target form in the video;
and determining whether the content of the video comprises a shooting behavior according to the occurrence times of the targets.
In one possible implementation manner, the processing the video frames of the multi-frame video frames except the initial video frame to obtain the number of times of target occurrence of the target form of the video capturing device in the video includes:
acquiring an initial matrix according to the target number frame video pictures in the multi-frame video pictures, wherein the initial matrix comprises a submatrix of the target number frame video pictures, the submatrix of any video picture is used for indicating the position information of a detection frame of the video camera equipment containing the target form in any video picture, and the target number frame video pictures are continuous target number frame video pictures starting from the initial video picture;
acquiring first position information of a detection frame of the video recording device containing the target form in a first video picture, wherein the first video picture is a video picture which continuously appears after the target number of frames of video pictures in the multi-frame video picture;
Acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix to obtain a plurality of second position information;
acquiring a first occurrence number of the photographing equipment of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information;
and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In one possible implementation manner, the obtaining, according to the first location information and the plurality of second location information, a first number of occurrences of the target form of the recording device from the initial video frame to the first video frame includes:
determining first cross ratios between the detection frames determined by the first position information and the detection frames determined by the plurality of second position information respectively to obtain a target number of first cross ratios;
determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold;
Based on the first number being greater than a number threshold, taking a first number as a first number of occurrences of the recording device of the target modality from the initial video frame to the first video frame;
and based on the first number not larger than the number threshold, taking a second numerical value as a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture, wherein the second numerical value is smaller than the first numerical value.
In one possible implementation manner, the determining, according to the initial matrix and the first occurrence number, a target occurrence number of the recording device in the target form in the video includes:
updating the initial matrix according to the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the first video picture;
acquiring third position information of a detection frame of the video recording equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to a plurality of submatrices included in the target matrix to obtain a plurality of fourth position information;
Updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture;
and traversing video pictures except the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In one possible implementation manner, the updating the initial matrix according to the sub-matrix of the first video picture to obtain the target matrix includes:
deleting the submatrices of the initial video pictures in the initial matrix to obtain a reference matrix;
and acquiring the target matrix according to the reference matrix and the submatrices of the first video picture.
In one possible implementation manner, the updating the first occurrence number according to the third location information and the plurality of fourth location information to obtain a second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes:
Determining second cross-over ratios between the detection frames determined by the third position information and the detection frames determined by the fourth position information respectively to obtain a target number of second cross-over ratios;
determining a second number of second cross ratios greater than a target threshold among the target number of second cross ratios;
and updating the first occurrence times according to the second number to obtain a second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In one possible implementation manner, the updating the first occurrence number according to the second number, to obtain a second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes:
updating the first occurrence times according to the second number to obtain reference times;
acquiring reference position information of a detection frame of the video recording device containing the target form in a reference video picture, wherein the reference video picture is a video picture which is in front of the second video picture and is separated from the second video picture by a target number of video pictures;
Determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information;
and according to the third cross-over ratio, the reference times are adjusted to obtain the second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In one possible implementation manner, the updating the first occurrence number according to the second number to obtain the reference number includes:
based on the second number being greater than a number threshold, adding a third numerical value on the basis of the first occurrence number to obtain a reference number;
and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain a reference number, wherein the fourth numerical value is smaller than the third numerical value.
In one possible implementation manner, the adjusting the reference number according to the third merging ratio to obtain a second number of occurrences of the target form of the recording device between the initial video frame and the second video frame includes:
based on the third intersection ratio being greater than a target threshold, taking the reference number of times as a second number of occurrences of the recording device of the target modality between the initial video picture and the second video picture;
And adding a fifth numerical value on the basis of the reference number based on the third intersection ratio not larger than the target threshold value, so as to obtain a second occurrence number of the photographing equipment in the target form from the initial video picture to the second video picture.
In one possible implementation manner, the determining whether the content of the video includes a recording behavior according to the target occurrence number includes:
based on the number of times of occurrence of the target is greater than a number of times threshold, and a third video picture comprises the target-form recording equipment and the target part, fifth position information of a detection frame of the target-form recording equipment and sixth position information of the detection frame of the target part are obtained in the third video picture, wherein the third video picture is the last video picture detected in the multi-frame video picture;
determining a fourth intersection ratio of the target form of the recording device and the target part in the third video picture according to the fifth position information and the sixth position information;
and determining that the content of the video comprises recording behavior based on the fourth cross-over ratio being greater than a cross-over ratio threshold.
In a possible implementation manner, after the determining that the content of the video includes the recording behavior based on the fourth blending ratio being greater than the blending ratio threshold, the method further includes:
and adding a target mark at a position indicated by fifth position information of the third video picture, wherein the target mark is used for indicating that the recording equipment with the target form exists at the position indicated by the fifth position information.
In another aspect, an embodiment of the present application provides a recording detection apparatus, including:
the acquisition module is used for acquiring continuous multi-frame video pictures included in the video to be subjected to the shooting detection;
the determining module is used for determining an initial video picture in the multi-frame video pictures, wherein the initial video picture is a video picture of a shooting device with a target part and a target form appearing for the first time;
the determining module is further configured to determine a target cross-over ratio between a detection frame of the recording device including the target form and a detection frame including the target portion in the initial video frame;
the processing module is used for processing video pictures except the initial video picture in the multi-frame video pictures based on the target intersection ratio being larger than a reference threshold value to obtain the number of times of target occurrence of the shooting equipment in the target form in the video;
The determining module is further configured to determine whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In a possible implementation manner, the processing module is configured to obtain an initial matrix according to a target number of frame video frames in the multi-frame video frames, where the initial matrix includes a sub-matrix of the target number of frame video frames, and the sub-matrix of any video frame is used to indicate position information of a detection frame of a recording device including the target form in the any video frame, and the target number of frame video frames is a continuous target number of frame video frames starting from the initial video frame;
acquiring first position information of a detection frame of the video recording device containing the target form in a first video picture, wherein the first video picture is a video picture which continuously appears after the target number of frames of video pictures in the multi-frame video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix to obtain a plurality of second position information;
Acquiring a first occurrence number of the photographing equipment of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information;
and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In a possible implementation manner, the processing module is configured to determine first cross-ratios between the detection frames determined by the first location information and the detection frames determined by the plurality of second location information, so as to obtain a target number of first cross-ratios;
determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold;
based on the first number being greater than a number threshold, taking a first number as a first number of occurrences of the recording device of the target modality from the initial video frame to the first video frame;
and based on the first number not larger than the number threshold, taking a second numerical value as a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture, wherein the second numerical value is smaller than the first numerical value.
In a possible implementation manner, the processing module is configured to update the initial matrix according to a sub-matrix of the first video picture to obtain a target matrix, where the target matrix includes the sub-matrix of the first video picture;
acquiring third position information of a detection frame of the video recording equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to a plurality of submatrices included in the target matrix to obtain a plurality of fourth position information;
updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture;
and traversing video pictures except the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In a possible implementation manner, the processing module is configured to delete a sub-matrix of an initial video picture in the initial matrix to obtain a reference matrix;
and acquiring the target matrix according to the reference matrix and the submatrices of the first video picture.
In a possible implementation manner, the processing module is configured to determine second cross-over ratios between the detection frames determined by the third location information and the detection frames determined by the plurality of fourth location information, to obtain a target number of second cross-over ratios;
determining a second number of second cross ratios greater than a target threshold among the target number of second cross ratios;
and updating the first occurrence times according to the second number to obtain a second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In a possible implementation manner, the processing module is configured to update the first occurrence number according to the second number to obtain a reference number;
acquiring reference position information of a detection frame of the video recording device containing the target form in a reference video picture, wherein the reference video picture is a video picture which is in front of the second video picture and is separated from the second video picture by a target number of video pictures;
Determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information;
and according to the third cross-over ratio, the reference times are adjusted to obtain the second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In a possible implementation manner, the processing module is configured to add a third value based on the first occurrence number based on the second number being greater than a number threshold value, to obtain a reference number;
and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain a reference number, wherein the fourth numerical value is smaller than the third numerical value.
In a possible implementation manner, the processing module is configured to use the reference number as a second number of occurrences of the recording device in the target form between the initial video frame and the second video frame based on the third blending ratio being greater than a target threshold;
and adding a fifth numerical value on the basis of the reference number based on the third intersection ratio not larger than the target threshold value, so as to obtain a second occurrence number of the photographing equipment in the target form from the initial video picture to the second video picture.
In a possible implementation manner, the determining module is configured to obtain, based on the number of occurrences of the target being greater than a number threshold, fifth location information of a detection frame of the recording device in the target form and sixth location information of a detection frame of the target location in a third video frame, where the third video frame is a last video frame detected in the multi-frame video frame, where the third video frame includes the recording device in the target form and the target location;
determining a fourth intersection ratio of the target form of the recording device and the target part in the third video picture according to the fifth position information and the sixth position information;
and determining that the content of the video comprises recording behavior based on the fourth cross-over ratio being greater than a cross-over ratio threshold.
In one possible implementation, the apparatus further includes:
and the adding module is used for adding a target mark at the position indicated by the fifth position information of the third video picture, wherein the target mark is used for indicating that the recording equipment with the target form exists at the position indicated by the fifth position information.
In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so that the electronic device implements any one of the above-mentioned recording detection methods.
In another aspect, there is provided a computer readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to cause a computer to implement any of the above-described methods of detecting a video recording.
In another aspect, there is also provided a computer program or computer program product having stored therein at least one computer instruction that is loaded and executed by a processor to cause a computer to implement any of the above-described methods of recording detection.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:
according to the technical scheme, the initial video picture is determined in the continuous multi-frame video pictures included in the video, when the target cross ratio between the detection frame of the video equipment with the target form and the detection frame with the target position included in the initial video picture is larger than the reference threshold, the multi-frame video pictures included in the video are tracked, so that the number of times of target occurrence of the video equipment with the target form in the video is obtained, and whether the content of the video includes the shooting behavior is determined according to the number of times of target occurrence. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an implementation environment of a method for detecting a video recording according to an embodiment of the present application;
fig. 2 is a flowchart of a method for detecting a video recording according to an embodiment of the present application;
fig. 3 is a schematic diagram of a detection frame of a recording device including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to an embodiment of the present application;
fig. 4 is a schematic diagram of a detection frame of a recording apparatus including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to another embodiment of the present application;
fig. 5 is a flowchart of a method for detecting a video recording according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a recording detection apparatus according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation environment of a method for detecting a video recording according to an embodiment of the present application, as shown in fig. 1, where the implementation environment includes: anelectronic device 101. Theelectronic device 101 may be a terminal device or a server, which is not limited in the embodiment of the present application. Theelectronic device 101 is configured to execute the recording detection method provided in the embodiment of the present application.
Optionally, based on theelectronic device 101 being a terminal device, the terminal device is any electronic device product that can perform man-machine interaction with a user through one or more manners such as a keyboard, a touch pad, a touch screen, a remote controller, a voice interaction or a handwriting device, for example, a PC (Personal Computer ), a mobile phone, a smart phone, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device, a PPC (Pocket PC, palm computer), a tablet computer, a smart car machine, a smart television, a smart speaker, and the like. Based on theelectronic device 101 as a server, the server may be a server, a server cluster formed by a plurality of server units, or a cloud computing service center. The terminal device establishes communication connection with the server through a wired network or a wireless network.
It will be appreciated by those skilled in the art that the above described terminal devices and servers are merely illustrative, and that other terminal devices or servers, now existing or hereafter may be present, are intended to be within the scope of the present application, and are incorporated herein by reference.
The embodiment of the present application provides a method for detecting a video recording, which can be applied to the above-mentioned implementation environment, taking a flowchart of a method for detecting a video recording, which is shown in fig. 2 and provided in the embodiment of the present application, as an example, the method can be executed by theelectronic device 101 in fig. 1. As shown in fig. 2, the method includes the followingsteps 201 to 205.
Instep 201, a continuous multi-frame video frame included in a video to be subjected to a recording detection is acquired.
In one possible implementation, a storage space of the electronic device stores a plurality of continuous frames of video frames included in the video to be subjected to the recording detection, and a storage space of the electronic device obtains a plurality of continuous frames of video frames included in the video to be subjected to the recording detection. The video to be subjected to the recording detection may be a video that has already been recorded or may be a video that is being recorded, which is not limited in the embodiment of the present application.
Or the electronic equipment acquires the video to be subjected to the video shooting detection, and carries out framing treatment on the video to be subjected to the video shooting detection to obtain continuous multi-frame video pictures included in the video to be subjected to the video shooting detection. The manner of acquiring the video to be subjected to the recording detection includes, but is not limited to, the following four types.
In the first mode, a plurality of candidate videos are stored in the electronic device, and any one candidate video is used as a video to be subjected to video recording detection.
And in the second mode, the electronic equipment is terminal equipment, a plurality of candidate videos are stored in the terminal equipment, and the selected candidate videos are used as videos to be subjected to video shooting detection.
In a third mode, the electronic device is a server, the server and the terminal device are in communication connection through a wired network or a wireless network, a plurality of candidate videos are stored in the terminal device, the terminal device takes the selected candidate videos as videos to be subjected to video shooting detection, and the terminal device sends the videos to be subjected to video shooting detection to the server so that the server can acquire the videos to be subjected to video shooting detection.
In a fourth mode, the electronic device is a terminal device, and a first application program for video acquisition is installed and operated in the terminal device. The terminal equipment calls a first application program to collect videos, and the collected videos are used as videos to be subjected to video recording detection.
The first application may be any application capable of capturing video, which is not limited in this embodiment of the present application. Illustratively, the first application is a camera. The acquired video may be a video that has been acquired, or may be a video that is being acquired, which is not limited in this embodiment.
It should be noted that any of the above methods may be selected to obtain the video to be subjected to the recording detection, which is not limited in the embodiment of the present application.
In one possible implementation manner, the second application program for video framing is installed and running in the electronic device, and the second application program may be any program capable of performing video framing operation, which is not limited in the embodiment of the present application. Illustratively, the second application is video editing software (Premiere Pro, PR). After the electronic equipment acquires the video to be subjected to the video shooting detection, a second application program is called to carry out framing processing on the video to be subjected to the video shooting detection, and continuous multi-frame video pictures included in the video are obtained. Illustratively, the video to be subjected to the camcorder detection is subjected to framing processing, resulting in continuous 7-frame video pictures included in the video.
Instep 202, an initial video frame, which is a video frame of a recording apparatus in which a target portion and a target form appear for the first time, is determined from among a plurality of frame video frames.
The target part is a hand, and the target form of the photographing device refers to that the photographing device faces the lens in the front direction, namely, the image of the front and back sides of the photographing device is included in the initial video picture. The image pickup apparatus is any apparatus capable of performing image pickup or image recording, and the embodiment of the present application is not limited thereto. Illustratively, the recording device is a cell phone, or the recording device is a camera. For example, if the recording device is a mobile phone, the recording device in the target form refers to the front facing lens of the mobile phone, that is, the video picture includes an image of the front and back sides of the mobile phone.
In one possible implementation, the process of determining an initial video picture in a plurality of frames of video pictures includes: detecting each frame of video picture to obtain video content included in each frame of video picture; determining video frames of which the video content comprises a shooting device and a target part in a plurality of frames of video frames; classifying the video content including the video equipment and the video equipment included in the video picture of the target part to obtain the form of the video content including the video equipment and the video equipment included in the video picture of the target part; the video picture whose video content includes a recording device and a target portion includes a video picture whose form of the recording device is a target form and whose appearance time is earliest is taken as an initial video picture.
And calling a target detection network to detect each frame of video picture so as to obtain video content included in each frame of video picture. And calling a target classification network to classify the video content including the video equipment and the video equipment included in the video picture of the target part, so as to obtain the form of the video content including the video equipment and the video equipment included in the video picture of the target part.
The target detection network needs to be acquired before it is invoked. The process of acquiring the target detection network comprises the following steps: acquiring a first training data set and an initial detection network, wherein the first training data set comprises a first image and image contents contained in the first image, the first image comprises an image of a photographing device and an image of a target part, and the image contents contained in the first image comprise the photographing device and the target part; the initial detection network is any network capable of content detection, and is illustratively YOLO (You Only Look Once, a detection network). Training the initial detection network according to the first training data set to obtain a target detection network.
Optionally, training the initial detection network according to the first training data set, and the process of obtaining the target detection network includes: and carrying out data enhancement on the first image in the first training data set to obtain a first training data set after data enhancement, and training the initial detection network according to the first training data set after data enhancement to obtain a target detection network. The data enhancement mode of the first image includes, but is not limited to, random scaling, data normalization, image stitching and the like.
The target classification network needs to be acquired before it is invoked. The process of acquiring the target classification network comprises the following steps: a second training dataset and an initial classification network are acquired, the second training dataset comprising a second image, a morphology of the second image, a third image, and a morphology of the third image. The second image and the third image are images of the photographing device, the second image is in a form of a front back surface, and the third image is in a form of a non-front back surface. The initial classification network is any network capable of content classification, and is illustratively a residual neural network (Resnet). Training the initial classification network according to the second training data set to obtain a target classification network. Optionally, according to the second training data set, the parameters of the initial classification network are updated by adopting a learning strategy of pre-training weight combined with fine-tuning (fine-tune), so as to obtain the target classification network.
Optionally, training the initial classification network according to the second training data set, and the process of obtaining the target classification network includes: and when the number of the second images and the third images included in the second training data set is unbalanced, extracting the second images and the third images from the second training data set in a random sampling mode, wherein the number of the extracted second images and the number of the extracted third images are the same. Training the initial classification network according to the extracted second image, the extracted form of the third image and the extracted form of the third image to obtain a target classification network. And the extracted second image and the extracted third image can be subjected to data enhancement, and the initial classification network is trained according to the image after data enhancement, the form of the second image and the form of the third image, so that the target classification network is obtained. The data enhancement mode of the extracted second image and the extracted third image includes, but is not limited to, random clipping, random horizontal flipping, random vertical flipping, random Gaussian blur, scaling, left-right or up-down random zero padding and the like.
Instep 203, a target cross-over ratio between a detection frame of the recording apparatus including the target form and a detection frame including the target portion in the initial video picture is determined.
Optionally, the determining the target cross-over ratio between the detection frame of the recording device including the target form and the detection frame including the target portion in the initial video frame includes: if a coincidence area exists between a detection frame of the camera equipment containing the target form in the initial video picture and a detection frame containing the target position in the initial video picture, determining a first area of the coincidence area between the detection frame of the camera equipment containing the target form in the initial video picture and the detection frame containing the target position in the initial video picture; determining a second area of a graph formed by a detection frame of the camera equipment containing the target form in the initial video picture and a detection frame containing the target part in the initial video picture; and taking the ratio between the first area and the second area as the target cross-over ratio between a detection frame of the camera equipment containing the target form and a detection frame containing the target part in the initial video picture. If there is no overlapping area between the detection frame of the photographing apparatus containing the target form in the initial video picture and the detection frame containing the target part in the initial video picture, 0 is taken as the target overlap ratio between the detection frame of the photographing apparatus containing the target form and the detection frame containing the target part in the initial video picture.
Fig. 3 is a schematic diagram of a detection frame of a recording device including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to an embodiment of the present application. Wherein 301 is a detection frame of a recording device including a target form in an initial video frame, and 302 is a detection frame including a target portion in the initial video frame. Since there is a coincidence region between 301 and 302, the hatched region in fig. 3 is a coincidence region; therefore, the ratio of the area of the shadow area to the area of the pattern formed by the detection frame of the photographing apparatus including the target form in the initial video picture and the detection frame including the target portion in the initial video picture is taken as the target cross-over ratio between the detection frame of the photographing apparatus including the target form in the initial video picture and the detection frame including the target portion.
Fig. 4 is a schematic diagram of a detection frame of a recording apparatus including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to another embodiment of the present application. Wherein 401 is a detection frame of a recording device including a target form in an initial video frame, and 402 is a detection frame including a target portion in the initial video frame. Since there is no overlapping area between 401 and 402, 0 is taken as the target overlap ratio between the detection frame of the recording apparatus including the target form and the detection frame including the target portion in the initial video picture.
Instep 204, based on the target overlap ratio being greater than the reference threshold, the video frames of the multi-frame video frames except the initial video frame are processed to obtain the number of times of target occurrence of the target-shaped video recording device in the video.
In one possible implementation, the reference threshold is set empirically or adjusted according to the implementation environment, which is not limited by the embodiments of the present application. The process for processing the video pictures except the initial video picture in the multi-frame video pictures to obtain the number of times of target occurrence of the shooting equipment in the target form in the video comprises the following steps: acquiring an initial matrix according to the target number of frames of video pictures in a plurality of frames of video pictures; acquiring a first occurrence number of the shooting equipment in the target form from the initial video picture to the first video picture according to the first video picture and the initial matrix; and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
The target number is set based on experience, or is adjusted according to the total frame number of the video frames included in the video, which is not limited in the embodiment of the present application. The target number is greater than zero and less than the total number of frames of the video frames that the video includes. The initial matrix comprises a submatrix of the target number of frames of video pictures, and the submatrix of any video picture is used for indicating the position information of a detection frame of the video equipment containing the target form in any video picture. The target number of frames of video is a continuous target number of frames of video starting with the initial video. The first video picture is a video picture that continuously appears after the target number of frames of video pictures in the multi-frame video picture.
Illustratively, the target number is 3, the initial video frame is the first frame video frame, the target number is the first frame video frame, the second frame video frame, and the third frame video frame, and the first video frame is the fourth frame video frame. For another example, if the target number is 3 and the initial video frame is the fourth frame video frame, the target number is the fourth frame video frame, the fifth frame video frame, and the sixth frame video frame, and the first video frame is the seventh frame video frame.
In one possible implementation, the process of obtaining the initial matrix according to a target number of frames of video frames of the multi-frame video frames includes: processing the video pictures of the target number of frames to obtain sub-matrixes respectively corresponding to the video pictures of the target number of frames; and obtaining an initial matrix according to the sub-matrixes respectively corresponding to the target number of frames of video pictures. Optionally, stacking the sub-matrixes respectively corresponding to the video frames of the target number to obtain an initial matrix. The initial matrix records the video pictures of the target number of frames, and provides a basis for the subsequent determination of the occurrence times of the photographing equipment.
Optionally, the process of processing the target number of frames of video frames to obtain the sub-matrix respectively corresponding to the target number of frames of video frames includes: for any frame of video pictures in the target number of frames of video pictures, based on the fact that any frame of video pictures comprises a shooting device, the shape of the shooting device is the target shape, and a matrix corresponding to the position information of a detection frame of the shooting device containing the target shape is used as a sub-matrix corresponding to any frame of video pictures; the first matrix is used as a sub-matrix corresponding to any frame of video picture based on the fact that any frame of video picture comprises the shooting equipment and the form of the shooting equipment is not the target form, or based on the fact that any frame of video picture does not comprise the shooting equipment.
The first matrix is set based on experience, or adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the first matrix is [0,0]. The matrix corresponding to the position information of the detection frame of the recording apparatus including the target form is a matrix composed of the position information of the detection frame of the recording apparatus including the target form. The positional information of the detection frame including the target form of the recording apparatus may be positional information of the upper left corner and positional information of the lower right corner of the detection frame including the target form of the recording apparatus, positional information of the upper right corner and positional information of the lower left corner of the detection frame including the target form of the recording apparatus, or positional information of the center point of the detection frame including the target form of the recording apparatus, and the length and width of the detection frame, which are not limited in the embodiment of the present application. For example, the position information of the upper left corner of the detection frame of the recording apparatus including the target form is (X1 ,Y1 ) The position information of the lower right corner is (X2 ,Y2 ) The matrix composed of the position information of the detection frame of the recording device with the target form is [ X ]1 ,Y1 ,X2 ,Y2 ]。
In one possible implementation manner, for any frame of video frames in the target number of frame of video frames, detecting any frame of video frames to obtain video content included in any frame of video frames; and classifying the video equipment included in any frame of video picture based on the video content included in any frame of video picture to obtain the form of the video equipment included in any frame of video picture. This procedure is similar to the procedure of determining the initial video frame instep 202, and will not be described here.
Optionally, the process of obtaining the first occurrence number of the shooting device of the target form from the first video picture to the first video picture according to the first video picture and the initial matrix includes: acquiring first position information of a detection frame of a photographing device containing a target form in a first video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the initial matrix, and acquiring a plurality of second position information; and acquiring a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information.
The process of acquiring the first position information of the detection frame of the video recording device containing the target form in the first video picture comprises the following steps: the position information of a detection frame of a recording apparatus including a target form included in a first video picture is taken as first position information. For example, the position information of the upper left corner of the detection frame of the recording apparatus including the target form included in the first video picture is (X1 ,Y1 ) The position information of the lower right corner is (X2 ,Y2 ) Will (X)1 ,Y1 ) And (X)2 ,Y2 ) As the first position information.
According to a plurality of submatrices included in the initial matrix, acquiring the position information of a detection frame of the photographing device containing the target form in the video picture corresponding to each submatrix, and acquiring a plurality of second position information comprises the following steps: and taking the position information corresponding to any one of the submatrices in the initial matrix as second position information. For example, one of the submatrices in the initial matrix is [ X ]3 ,Y3 ,X4 ,Y4 ]Will (X)3 ,Y3 ) And (X)4 ,Y4 ) And second position information of a detection frame of the recording device containing the target form in a video picture corresponding to the submatrix.
Optionally, the process of obtaining the first occurrence number of the shooting device of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information includes: determining first cross ratios between the detection frames determined by the first position information and the detection frames determined by the second position information respectively to obtain a target number of first cross ratios; determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold; based on the first number being greater than a number threshold, taking the first number as a first number of occurrences of the recording device in the target modality from the initial video frame to the first video frame; and taking the second numerical value as a first occurrence number of the photographing device between the initial video picture and the first video picture in a target form based on the first number not larger than a number threshold, wherein the second numerical value is smaller than the first numerical value.
The target threshold, the number threshold, the first value and the second value are all set based on experience, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Optionally, the number threshold is determined based on the frequency of occurrence and the target number, e.g., the number threshold is the product of the frequency of occurrence and the target number. The occurrence frequency is a measure of occurrence frequency of the video recording device in the target form in the continuous frames in the video. The target threshold is used to measure the magnitude of movement of the recording device that allows the target modality. The smaller the target threshold, the larger the movement amplitude of the recording apparatus of the target modality is allowed, whereas the larger the target threshold, the smaller the movement amplitude of the recording apparatus of the target modality is allowed. Illustratively, the target threshold is 80%, the number threshold is 2, the first value is 1, and the second value is 0.
The process of determining a first cross-over ratio between the detection frames determined by the first location information and the detection frames determined by the plurality of second location information, respectively, includes: for any one of the plurality of second position information, if a coincidence region exists between the detection frame determined by the first position information and the detection frame determined by any one of the second position information, determining a third area of the coincidence region between the detection frame determined by the first position information and the detection frame determined by any one of the second position information; determining a fourth area of a graph formed by the detection frame determined by the first position information and any one of the detection frames determined by the second position information; the ratio between the third area and the fourth area is used as a first cross ratio between the detection frame determined by the first position information and the detection frame determined by any one of the second position information. If there is no overlapping area between the detection frame determined by the first position information and the detection frame determined by any one of the second position information, 0 is taken as a first overlapping ratio between the detection frame determined by the first position information and the detection frame determined by any one of the second position information.
According to the method for determining the first occurrence number, on one hand, through judging the cross-over ratio of the shooting equipment between the first video frames and the target number of frames of video frames, the spatial characteristics of shooting behaviors are simulated, namely the amplitude of shake is small while the shooting equipment is kept to be aligned to a camera in the forward direction, on the other hand, the occurrence number of the shooting equipment is calculated through the combination of the first video frames and the target number of frames of video frames, and the time characteristics of videos are utilized, so that the determination of the first occurrence number is more accurate. In addition, the fault tolerance of detecting the loss of the video recording device is increased by the judging method of combining the information of the first video picture and the target number of frames of video pictures.
In one possible implementation, the determining, according to the initial matrix and the first occurrence number, a target occurrence number of the video capturing device in the target form includes: updating the initial matrix according to the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the first video picture; updating the first occurrence number according to the second video picture and the target matrix to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture; traversing video pictures except for the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
The second video picture is a video picture which is behind and adjacent to the first video picture. Illustratively, the first video picture is a fourth frame video picture, and the second video picture is a fifth frame video picture.
Before updating the initial matrix according to the sub-matrix of the first video frame, the sub-matrix of the first video frame needs to be acquired, and the acquisition process of the sub-matrix of the first video frame is similar to the acquisition process of the sub-matrix corresponding to the video frames of the target number in the above steps, and will not be described in detail.
Optionally, updating the initial matrix according to the sub-matrix of the first video frame, and the process of obtaining the target matrix includes: deleting the submatrices of the initial video pictures in the initial matrix to obtain a reference matrix; and acquiring a target matrix according to the reference matrix and the submatrices of the first video picture. For example, the reference matrix and the sub-matrix of the first video picture are stacked to obtain the target matrix.
Illustratively, the initial matrix includes a sub-matrix of the first frame video picture, the second frame video picture and the third frame video picture, and the first frame video picture is the initial video picture, so that the sub-matrix of the first frame video picture in the initial matrix is deleted to obtain a reference matrix, and the reference matrix includes the sub-matrix of the second frame video picture and the sub-matrix of the third frame video picture. And stacking the reference matrix and the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the second frame of video picture, the submatrices of the third frame of video picture and the submatrices of the fourth frame of video picture (namely the submatrices of the first video picture).
Optionally, the updating the first occurrence number according to the second video frame and the target matrix to obtain a second occurrence number of the photographing device in the target form between the initial video frame and the second video frame includes: acquiring third position information of a detection frame of the photographing equipment containing the target form in a second video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the target matrix to obtain a plurality of fourth position information; and updating the first appearance times according to the third position information and the plurality of fourth position information to obtain the second appearance times of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, the process of acquiring the third position information is similar to the process of acquiring the first position information in the above step, and the process of acquiring the fourth position information is similar to the process of acquiring the second position information in the above step, which is not described herein.
Optionally, the updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain the second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes: determining second cross-over ratios between the detection frames determined by the third position information and the detection frames determined by the fourth position information respectively to obtain a target number of second cross-over ratios; determining a second number of second cross ratios of the target number of second cross ratios greater than the target threshold; and updating the first occurrence number according to the second number to obtain a second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture.
The process of determining the second cross-over ratio between the detection frame determined by the third position information and the detection frames determined by the fourth position information is similar to the process of determining the first cross-over ratio between the detection frame determined by the first position information and the detection frames determined by the second position information in the above steps, and will not be described herein.
In the embodiment of the present application, the determining manner of the second occurrence number of the video recording device, which obtains the target form, between the initial video frame and the second video frame is not limited by updating the first occurrence number according to the second number. Optionally, according to the second number, the first occurrence number is updated by the following two implementations, so as to obtain a second occurrence number of the target-form recording device between the initial video frame and the second video frame.
According to the first implementation mode, based on the fact that the second number is larger than the number threshold, a third numerical value is added on the basis of the first occurrence number, and the second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture is obtained. And adding a reference value on the basis of the first occurrence number based on the second number not larger than the number threshold value to obtain a second occurrence number of the photographing equipment in the target form between the initial video picture and the second video picture.
The third value and the reference value are set based on experience, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. The reference value is less than the third value. Illustratively, the third value is 1 and the reference value is-1.
The second implementation mode is that the first occurrence times are updated according to the second number to obtain reference times, and reference position information of a detection frame of the shooting and recording equipment with a target form in a reference video picture is obtained, wherein the reference video picture is a video picture of a target number of video pictures before and at intervals from a second video picture; determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information; and according to the third cross-over ratio, adjusting the reference times to obtain the second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture.
Optionally, the target number of frames of video frames is a fourth frame of video frames, a fifth frame of video frames and a sixth frame of video frames, the first video frame is a seventh frame of video frames, the second video frame is an eighth frame of video frames, and the reference video frame is the fifth frame of video frames.
Updating the first occurrence number according to the second number, wherein the process of obtaining the reference number comprises the following steps: based on the second number being greater than the number threshold, adding a third numerical value on the basis of the first occurrence number to obtain a reference number; and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain the reference number, wherein the fourth numerical value is smaller than the third numerical value. The third value and the fourth value are set empirically or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the third value is 1 and the fourth value is 0.
The process of determining the third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information is similar to the process of determining the first intersection ratio between the detection frame determined by the first position information and the detection frames determined by the plurality of second position information in the above steps, and will not be described herein.
Optionally, the process of adjusting the reference number of times according to the third cross-over ratio to obtain the second number of occurrences of the target form of the recording device between the initial video frame and the second video frame includes: taking the reference number as a second occurrence number of the photographing device in the target form between the initial video picture and the second video picture based on the third intersection ratio being larger than the target threshold; and adding a fifth numerical value based on the reference times based on the third intersection ratio not larger than the target threshold value, and obtaining a second occurrence time of the photographing equipment in the target form between the initial video picture and the second video picture.
The target threshold and the fifth value are set empirically, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the target threshold is 80% and the fifth value is-1.
It should be noted that, the second occurrence number of the photographing device in the target form between the initial video frame and the second video frame may be obtained by any implementation manner, which is not limited in this embodiment of the present application.
It should be further noted that, a process of determining the number of times of occurrence of the target in the video by the recording device in the target form is similar to a process of determining the number of times of occurrence of the second between the initial video frame and the second video frame by the recording device in the target form, which is not described in detail in the embodiment of the present application.
Instep 205, it is determined whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In one possible implementation, according to the number of occurrences of the target, there are two implementations of determining whether the content of the video includes a recording behavior.
The first implementation mode is to determine that the content of the video comprises a shooting behavior based on the fact that the number of occurrence times of the target is larger than a number threshold; and determining that the content of the video does not comprise the recording behavior based on the target occurrence number not being greater than the number threshold.
The frequency threshold is set based on experience, or is adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the number of times threshold is 3.
The second implementation mode is based on the fact that the number of times of occurrence of the target is larger than a number threshold, and the third video picture comprises a target-form shooting device and a target part, and fifth position information of a detection frame of the target-form shooting device and sixth position information of the detection frame of the target part are obtained in the third video picture; determining a fourth intersection ratio of the recording device of the target form and the target part in the third video picture according to the fifth position information and the sixth position information; and determining that the content of the video includes recording behavior based on the fourth intersection ratio being greater than the intersection ratio threshold.
The third video picture is the last video picture to be detected in the multi-frame video pictures. The process of determining the fourth blending ratio of the target-form capturing device and the target portion in the third video frame according to the fifth position information and the sixth position information is similar to the process of determining the first blending ratio in the above steps, and will not be described herein. A fourth intersection ratio of the recording apparatus of the target modality and the target site in the third video picture indicates a relative relationship of the recording apparatus of the target modality and the target site in the third video picture. The cross ratio threshold is set empirically or adjusted according to the implementation environment, and this is not limited in the embodiment of the present application, and is exemplified by 50%.
In one possible implementation, it is determined that the recording behavior is not included in the content of the video based on the fourth cross-ratio not being greater than the cross-ratio threshold. And acquiring seventh position information of a detection frame of the photographing device containing the target form in the candidate video picture and eighth position information of the detection frame containing the target position in the candidate video picture based on the fact that the number of times of occurrence of the target is larger than a number threshold and at least one of the photographing device and the target position of the photographing device containing the target form is not included in the third video picture. And determining a fifth intersection ratio of the video capturing device of the target form and the target position in the candidate video frame according to the seventh position information and the eighth position information, and determining that the content of the video includes capturing behavior based on the fifth intersection ratio being greater than an intersection ratio threshold. And determining that the video content does not include the recording behavior based on the fifth cross-over ratio not being greater than the cross-over ratio threshold. The candidate video frames are video frames which are detected among the third video frames, are nearest to the third video frames, and comprise a shooting device with a target form and a target part.
The process of determining the fifth merging ratio of the recording device of the target form and the target position in the candidate video frame according to the seventh position information and the eighth position information is similar to the process of determining the first merging ratio in the above steps, and will not be described herein.
In one possible implementation, the video-based content includes a recording behavior, and the third video frame includes a recording device in a target form and a target location, and a target mark is added at a position indicated by the fifth position information of the third video frame, where the target mark is used to indicate that the recording device in the target form exists at the position indicated by the fifth position information. The target mark may be any mark, which is not limited in this embodiment, and is exemplified by a red dot.
Alternatively, a reference mark is added to the position indicated by the seventh position information of the candidate video picture, the reference mark being used for indicating that the recording device of the target form exists at the position indicated by the seventh position information, based on at least one of the recording device of the target form and the target site not included in the third video picture, and the recording device of the target form is included in the content of the video. The reference sign may be any sign, which is not limited in the embodiment of the present application. Illustratively, the reference marks are green dots.
Optionally, based on the electronic device being a terminal device, after the terminal device adds the target mark at the position indicated by the fifth position information of the third video frame, the third video frame to which the target mark is added may be displayed, so that the user knows that the recording device with the target form exists at the position where the target mark is located. Or based on the electronic device as a server, the server and the terminal device are in communication connection through a wired network or a wireless network, after the server adds the target mark at the position indicated by the fifth position information of the third video image, the server sends the third video image added with the target mark to the terminal device, and the terminal device receives the third video image sent by the server and added with the target mark and displays the third video image added with the target mark, so that a user knows that a shooting device with a target form exists at the position where the target mark is located.
Optionally, based on the electronic device being a terminal device, after the terminal device adds the reference mark at the position indicated by the seventh position information of the candidate video frame, the candidate video frame to which the reference mark is added may be displayed, so that the user knows that the recording device of the target form exists at the position where the reference mark is located. Or based on the electronic device being a server, the server and the terminal device are in communication connection through a wired network or a wireless network, after the server adds the reference mark at the position indicated by the seventh position information of the candidate video images, the server sends the candidate video images added with the reference mark to the terminal device, and the terminal device receives the candidate video images added with the reference mark sent by the server and displays the candidate video images added with the reference mark so that a user knows that a recording device with a target form exists at the position of the reference mark.
According to the method, an initial video picture is determined in continuous multi-frame video pictures included in the video, when the target intersection ratio between a detection frame of the video camera with the target form and a detection frame with the target position included in the initial video picture is larger than a reference threshold value, the multi-frame video pictures included in the video are tracked, so that the number of times of target occurrence of the video camera with the target form in the video is obtained, and whether the video content includes the shooting behavior is determined according to the number of times of target occurrence. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
Fig. 5 is a flowchart of a method for detecting a video recording according to an embodiment of the present application, where the method includes the following steps.
Step 501, a continuous multi-frame video picture included in a video to be subjected to video recording detection is acquired.
In a possible implementation manner, the process of acquiring the continuous multi-frame video frames included in the video to be subjected to the recording detection is described in theabove step 201, and will not be described herein.
Step 502, a target detection network is called to process each frame of video picture, and video content included in each frame of video picture is obtained.
In a possible implementation manner, the process of acquiring the video content included in each frame of video frame is described in theabove step 202, which is not described herein.
Step 503, determining a video picture of which the video content comprises a recording device and a target part in a plurality of frames of video pictures.
In a possible implementation manner, the process of determining that the video content includes the video frame of the recording device and the target portion in the multi-frame video frame is described in theabove step 202, which is not described herein.
Step 504, calling a target classification network to classify the video image including the video image capturing device and the target part, and obtaining the form of the video image capturing device including the video image capturing device and the target part.
In a possible implementation manner, the process of determining the form of the recording apparatus included in the video frame including the recording apparatus and the target portion is described in theabove step 202, and will not be described herein.
Step 505, determining an initial video frame according to the form of the recording device included in the video frame including the recording device and the target portion.
In a possible implementation manner, the process of determining the initial video frame is described in theabove step 202, and will not be described herein.
Step 506, determining a target cross-over ratio between a detection frame of the photographing device containing the target form and a detection frame containing the target part in the initial video picture.
In one possible implementation manner, the process of determining the target cross-over ratio between the detection frame of the recording apparatus including the target form and the detection frame including the target portion in the initial video frame is described in theabove step 203, and will not be described herein.
And 507, acquiring an initial matrix according to the t-frame video picture based on the fact that the target intersection ratio is larger than a reference threshold.
In one possible implementation, t is greater than zero and less than the total number of video pictures that the video includes. the t-frame video picture is a continuous t-frame video picture starting with the initial video picture. The process of obtaining the initial matrix is described instep 204 above, and will not be described in detail here.
Step 508, obtaining the sub-matrix of the t+1st frame video picture.
In a possible implementation manner, the process of obtaining the sub-matrix of the t+1st frame of video frame is described in theabove step 204, and will not be described herein.
Step 509, according to the sub-matrix and the initial matrix of the t+1st frame video frame, obtaining the first occurrence number of the shooting device in the target form from the initial video frame to the t+1st frame video frame.
In a possible implementation manner, the process of obtaining the first occurrence number of the recording device from the initial video frame to the t+1st frame of video frame in theabove step 204 is described, and will not be described herein.
Step 510, updating the first occurrence number to obtain the target occurrence number of the video recording device in the target form.
In a possible implementation manner, the process of determining the number of times of occurrence of the target in the video by the recording device in the target form is similar to the process ofstep 205, which is not described herein.
Step 511, determining whether the content of the video includes a recording device according to the number of occurrences of the target.
In a possible implementation manner, the process of determining whether the content of the video includes the recording device according to the number of occurrence of the target is described in theabove step 205, which is not described herein.
Fig. 6 is a schematic structural diagram of a recording detection apparatus according to an embodiment of the present application, where, as shown in fig. 6, the apparatus includes:
anacquisition module 601, configured to acquire a continuous multi-frame video frame included in a video to be subjected to recording detection;
the determiningmodule 602 is configured to determine an initial video frame from multiple frame video frames, where the initial video frame is a video frame of a recording device in which a target portion and a target form appear for the first time;
the determiningmodule 602 is further configured to determine a target cross-over ratio between a detection frame of the recording device including the target form and a detection frame including the target portion in the initial video frame;
aprocessing module 603, configured to process video frames of the multi-frame video frames except for the initial video frame based on the target overlap ratio being greater than the reference threshold value, to obtain the number of times of occurrence of the target in the video by the recording device in the target form;
the determiningmodule 602 is further configured to determine whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In a possible implementation manner, the processing module 603 is configured to obtain an initial matrix according to a target number of frame video frames in multiple frames of video frames, where the initial matrix includes a sub-matrix of the target number of frame video frames, and the sub-matrix of any video frame is used to indicate position information of a detection frame of a recording device including a target form in any video frame, where the target number of frame video frames is a continuous target number of frame video frames starting from the initial video frame; acquiring first position information of a detection frame of a camera equipment containing a target form in a first video picture, wherein the first video picture is a video picture which continuously appears after a target number of frames of video pictures in a plurality of frames of video pictures; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix, and acquiring a plurality of second position information; acquiring a first occurrence number of the shooting equipment in the target form from an initial video picture to a first video picture according to the first position information and the plurality of second position information; and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In a possible implementation manner, theprocessing module 603 is configured to determine first cross ratios between the detection frames determined by the first location information and the detection frames determined by the second location information, to obtain a target number of first cross ratios; determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold; based on the first number being greater than a number threshold, taking the first number as a first number of occurrences of the recording device in the target modality from the initial video frame to the first video frame; and based on the first number not being larger than the number threshold, taking the second number as a first occurrence number of the photographing device of the target form between the initial video picture and the first video picture, wherein the second number is smaller than the first number.
In a possible implementation manner, theprocessing module 603 is configured to update the initial matrix according to the sub-matrix of the first video frame to obtain a target matrix, where the target matrix includes the sub-matrix of the first video frame; acquiring third position information of a detection frame of the photographing equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the target matrix to obtain a plurality of fourth position information; updating the first occurrence times according to the third position information and the plurality of fourth position information to obtain second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture; traversing video pictures except for the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In a possible implementation manner, theprocessing module 603 is configured to delete the submatrices of the initial video frames in the initial matrix to obtain a reference matrix; and acquiring a target matrix according to the reference matrix and the submatrices of the first video picture.
In a possible implementation manner, theprocessing module 603 is configured to determine second cross-ratios between the detection frames determined by the third location information and the detection frames determined by the fourth location information, to obtain a target number of second cross-ratios; determining a second number of second cross ratios of the target number of second cross ratios greater than the target threshold; and updating the first occurrence number according to the second number to obtain a second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, theprocessing module 603 is configured to update the first occurrence number according to the second number to obtain a reference number; acquiring reference position information of a detection frame of a camera device containing a target form in a reference video picture, wherein the reference video picture is a video picture of a target number of video pictures before and at an interval from a second video picture; determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information; and according to the third cross-over ratio, adjusting the reference times to obtain the second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, theprocessing module 603 is configured to add a third value based on the first occurrence number to obtain the reference number based on the second number being greater than the number threshold; and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain the reference number, wherein the fourth numerical value is smaller than the third numerical value.
In a possible implementation manner, theprocessing module 603 is configured to use the reference number as the second occurrence number of the recording device in the target form between the initial video frame and the second video frame based on the third blending ratio being greater than the target threshold; and adding a fifth numerical value based on the reference times based on the third intersection ratio not larger than the target threshold value, and obtaining a second occurrence time of the photographing equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, the determiningmodule 602 is configured to obtain, based on that the number of occurrences of the target is greater than the number threshold, and that the third video frame includes a recording device in a target form and a target location, fifth location information of a detection frame of the recording device in the target form and sixth location information of a detection frame of the target location in the third video frame, where the third video frame is a last video frame detected in the multi-frame video frame; determining a fourth merging ratio of the target form of the video recording device and the target part in the third video picture according to the fifth position information and the sixth position information; and determining that the content of the video includes recording behavior based on the fourth intersection ratio being greater than the intersection ratio threshold.
In one possible implementation, the apparatus further includes:
and the adding module is used for adding a target mark at the position indicated by the fifth position information of the third video picture, wherein the target mark is used for indicating a recording device with a target form at the position indicated by the fifth position information.
The device determines an initial video picture from continuous multi-frame video pictures included in the video, tracks the multi-frame video pictures included in the video when the target merging ratio between a detection frame of the video equipment including the target form and a detection frame including the target position included in the initial video picture is larger than a reference threshold value, so as to obtain the number of target occurrence times of the video equipment including the target form in the video, and determines whether the content of the video includes the shooting behavior according to the number of target occurrence times. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
It should be understood that, in implementing the functions of the apparatus provided above, only the division of the above functional modules is illustrated, and in practical application, the above functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application, where theserver 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 701 and one ormore memories 702, where at least one program code is stored in the one ormore memories 702, and the at least one program code is loaded and executed by the one ormore processors 701 to implement the recording detection method provided in each method embodiment. Of course, theserver 700 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
Fig. 8 shows a block diagram of aterminal device 800 according to an exemplary embodiment of the present application. Theterminal device 800 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer.Terminal device 800 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.
In general, theterminal device 800 includes: a processor 801 and amemory 802.
Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of the content that the display screen is required to display. In some embodiments, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory.Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium inmemory 802 is used to store at least one instruction for execution by processor 801 to implement the recording detection methods provided by the method embodiments herein.
In some embodiments, theterminal device 800 may further optionally include: aperipheral interface 803, and at least one peripheral. The processor 801, thememory 802, and theperipheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to theperipheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one ofradio frequency circuitry 804, adisplay 805, acamera assembly 806,audio circuitry 807, apositioning assembly 808, and apower supply 809.
Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 andmemory 802. In some embodiments, processor 801,memory 802, andperipheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, thememory 802, and theperipheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
TheRadio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Theradio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. Theradio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, theradio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Theradio frequency circuitry 804 may communicate with other terminal devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, theradio frequency circuitry 804 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.
Thedisplay 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When thedisplay 805 is a touch display, thedisplay 805 also has the ability to collect touch signals at or above the surface of thedisplay 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, thedisplay 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, thedisplay 805 may be one, and disposed on a front panel of theterminal device 800; in other embodiments, thedisplay 805 may be at least two, and disposed on different surfaces of theterminal device 800 or in a folded design; in other embodiments, thedisplay 805 may be a flexible display disposed on a curved surface or a folded surface of theterminal device 800. Even more, thedisplay 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. Thedisplay 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
Thecamera assembly 806 is used to capture images or video. Optionally, thecamera assembly 806 includes a front camera and a rear camera. Typically, a front camera is provided at the front panel of theterminal device 800, and a rear camera is provided at the rear of theterminal device 800. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, thecamera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to theradio frequency circuit 804 for voice communication. For stereo acquisition or noise reduction purposes, a plurality of microphones may be respectively disposed at different portions of theterminal device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or theradio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments,audio circuit 807 may also include a headphone jack.
Thelocation component 808 is used to locate the current geographic location of theterminal device 800 to enable navigation or LBS (Location Based Service, location-based services). Thepositioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.
Thepower supply 809 is used to power the various components in theterminal device 800. Thepower supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When thepower supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, theterminal device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.
The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with theterminal apparatus 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control thedisplay screen 805 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.
The gyro sensor 812 may detect a body direction and a rotation angle of theterminal device 800, and the gyro sensor 812 may collect a 3D motion of the user to theterminal device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 813 may be disposed at a side frame of theterminal device 800 and/or at a lower layer of thedisplay 805. When the pressure sensor 813 is provided at a side frame of theterminal device 800, a grip signal of theterminal device 800 by a user can be detected, and the processor 801 performs left-right hand recognition or quick operation according to the grip signal acquired by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of thedisplay screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on thedisplay screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be provided at the front, rear, or side of theterminal device 800. When a physical key or vendor Logo is provided on theterminal device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of thedisplay screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of thedisplay screen 805 is turned up; when the ambient light intensity is low, the display brightness of thedisplay screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of thecamera module 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also called a distance sensor, is typically provided at the front panel of theterminal device 800. The proximity sensor 816 is used to collect the distance between the user and the front face of theterminal device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front face of theterminal device 800 gradually decreases, the processor 801 controls thedisplay 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of theterminal device 800 gradually increases, the processor 801 controls thedisplay 805 to switch from the off-screen state to the on-screen state.
It will be appreciated by those skilled in the art that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to cause a computer to implement any of the above-described recording detection methods.
Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program or a computer program product having at least one computer instruction stored therein, the at least one computer instruction being loaded and executed by a processor to cause a computer to implement any of the above-described methods of recording detection is also provided.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the videos referred to in this application are all acquired with sufficient authorization.
It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to any modification, equivalents, or improvements made within the principles of the present application.

Claims (14)

Translated fromChinese
1.一种摄录检测方法,其特征在于,所述方法包括:1. A camera detection method, characterized in that the method comprises:获取待进行摄录检测的视频包括的连续的多帧视频画面;Obtain continuous multi-frame video images included in the video to be recorded and detected;在所述多帧视频画面中确定初始视频画面,所述初始视频画面为首次出现目标部位和目标形态的摄录设备的视频画面;Determining an initial video picture in the multi-frame video picture, the initial video picture is the video picture of the video recording device where the target part and the target form appear for the first time;确定所述初始视频画面中包含所述目标形态的摄录设备的检测框和包含所述目标部位的检测框之间的目标交并比;Determining the target intersection ratio between the detection frame of the camera device containing the target form and the detection frame containing the target part in the initial video frame;基于所述目标交并比大于参考阈值,对所述多帧视频画面中除所述初始视频画面外的视频画面进行处理,得到所述目标形态的摄录设备在所述视频中的目标出现次数;Based on the target intersection-combining ratio being greater than a reference threshold, process the video frames in the multi-frame video frames except for the initial video frame to obtain the target occurrence times of the video recording device of the target form in the video ;根据所述目标出现次数,确定所述视频的内容中是否包括摄录行为。According to the number of occurrences of the target, it is determined whether the content of the video includes video recording.2.根据权利要求1所述的方法,其特征在于,所述对所述多帧视频画面中除所述初始视频画面外的视频画面进行处理,得到所述目标形态的摄录设备在所述视频中的目标出现次数,包括:2. The method according to claim 1, characterized in that, the video frames other than the initial video frame in the multi-frame video frames are processed to obtain the video recording device of the target form in the described multi-frame video frame. Target occurrences in the video, including:根据所述多帧视频画面中的目标数量帧视频画面,获取初始矩阵,所述初始矩阵中包括所述目标数量帧视频画面的子矩阵,任一视频画面的子矩阵用于指示包含所述目标形态的摄录设备的检测框在所述任一视频画面中的位置信息,所述目标数量帧视频画面为以所述初始视频画面为起始的、连续的目标数量帧视频画面;According to the target number of video frames in the multi-frame video picture, the initial matrix is obtained, the sub-matrix of the target number of frame video pictures is included in the initial matrix, and the sub-matrix of any video picture is used to indicate that the target number is included The position information of the detection frame of the recording device of the form in any video picture, and the target number of frame video pictures is a continuous target number of frame video pictures starting from the initial video picture;获取包含所述目标形态的摄录设备的检测框在第一视频画面中的第一位置信息,所述第一视频画面为所述多帧视频画面中在所述目标数量帧视频画面之后连续出现的视频画面;Obtaining the first position information of the detection frame of the video recording device containing the target form in the first video frame, where the first video frame appears consecutively after the target number of video frames in the multi-frame video frame video screen;根据所述初始矩阵中包括的多个子矩阵,获取包含所述目标形态的摄录设备的检测框在各个子矩阵对应的视频画面中的位置信息,得到多个第二位置信息;According to the plurality of sub-matrices included in the initial matrix, the position information of the detection frame of the recording device containing the target form in the video picture corresponding to each sub-matrix is obtained, and a plurality of second position information is obtained;根据所述第一位置信息和所述多个第二位置信息,获取所述目标形态的摄录设备在所述初始视频画面到所述第一视频画面之间的第一出现次数;According to the first location information and the plurality of second location information, obtain a first number of occurrences of the shooting device in the target form between the initial video frame and the first video frame;根据所述初始矩阵和所述第一出现次数,确定所述目标形态的摄录设备在所述视频中的目标出现次数。According to the initial matrix and the first number of appearances, determine the number of occurrences of the target in the video of the shooting device of the target form.3.根据权利要求2所述的方法,其特征在于,所述根据所述第一位置信息和所述多个第二位置信息,获取所述目标形态的摄录设备在所述初始视频画面到所述第一视频画面之间的第一出现次数,包括:3. The method according to claim 2, characterized in that, according to the first position information and the plurality of second position information, the video recording device that acquires the target form starts from the initial video frame to The first number of occurrences between the first video frames includes:确定所述第一位置信息确定的检测框分别和所述多个第二位置信息确定的检测框之间的第一交并比,得到目标数量个第一交并比;Determining first intersection and union ratios between the detection frames determined by the first position information and the detection frames determined by the plurality of second position information respectively, to obtain a target number of first intersection and union ratios;确定所述目标数量个第一交并比中大于目标阈值的第一交并比的第一个数;determining the first number of the target number of first intersection and union ratios that is greater than the target threshold;基于所述第一个数大于个数阈值,将第一数值作为所述目标形态的摄录设备在所述初始视频画面到所述第一视频画面之间的第一出现次数;Based on the first number being greater than the number threshold, using the first value as the first number of occurrences of the camera in the target form between the initial video frame and the first video frame;基于所述第一个数不大于所述个数阈值,将第二数值作为所述目标形态的摄录设备在所述初始视频画面到所述第一视频画面之间的第一出现次数,所述第二数值小于所述第一数值。Based on the fact that the first number is not greater than the number threshold, the second value is used as the first number of occurrences of the video recording device in the target form between the initial video frame and the first video frame, so The second value is smaller than the first value.4.根据权利要求2或3所述的方法,其特征在于,所述根据所述初始矩阵和所述第一出现次数,确定所述目标形态的摄录设备在所述视频中的目标出现次数,包括:4. The method according to claim 2 or 3, characterized in that, according to the initial matrix and the first number of appearances, the number of target appearances of the recording device of the target form in the video is determined ,include:根据所述第一视频画面的子矩阵,对所述初始矩阵进行更新,得到目标矩阵,所述目标矩阵中包括所述第一视频画面的子矩阵;updating the initial matrix according to the sub-matrix of the first video picture to obtain a target matrix, the target matrix including the sub-matrix of the first video picture;获取包含有所述目标形态的摄录设备的检测框在第二视频画面中的第三位置信息,所述第二视频画面为在所述第一视频画面之后且与所述第一视频画面相邻的视频画面;Acquiring the third position information of the detection frame of the recording device containing the target form in the second video frame, the second video frame is after the first video frame and is the same as the first video frame Neighboring video screen;根据所述目标矩阵包括的多个子矩阵,获取包含有所述目标形态的摄录设备的检测框在各个子矩阵对应的视频画面中的位置信息,得到多个第四位置信息;According to the plurality of sub-matrices included in the target matrix, the position information of the detection frame of the recording device containing the target form in the video picture corresponding to each sub-matrix is obtained, and a plurality of fourth position information is obtained;根据所述第三位置信息和所述多个第四位置信息,对所述第一出现次数进行更新,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数;According to the third position information and the plurality of fourth position information, the first number of occurrences is updated to obtain the recording device in the target form between the initial video frame and the second video frame. the second number of occurrences between按照上述更新过程遍历所述视频中除所述目标数量帧视频画面、所述第一视频画面和所述第二视频画面之外的视频画面,得到所述目标形态的摄录设备在所述视频中的目标出现次数。According to the above-mentioned updating process, traverse the video frames in the video except the target number of frame video frames, the first video frame and the second video frame, and obtain the video recording device of the target form in the video frame. The number of target occurrences in .5.根据权利要求4所述的方法,其特征在于,所述根据所述第一视频画面的子矩阵,对所述初始矩阵进行更新,得到目标矩阵,包括:5. The method according to claim 4, wherein said initial matrix is updated according to the sub-matrix of said first video picture to obtain a target matrix, comprising:将所述初始矩阵中的初始视频画面的子矩阵删除,得到参考矩阵;Deleting the sub-matrix of the initial video picture in the initial matrix to obtain a reference matrix;根据所述参考矩阵和所述第一视频画面的子矩阵,获取所述目标矩阵。Obtain the target matrix according to the reference matrix and the sub-matrix of the first video frame.6.根据权利要求4所述的方法,其特征在于,所述根据所述第三位置信息和所述多个第四位置信息,对所述第一出现次数进行更新,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数,包括:6. The method according to claim 4, characterized in that, according to the third position information and the plurality of fourth position information, the first number of occurrences is updated to obtain the target form The second number of occurrences of the recording device between the initial video frame and the second video frame includes:确定所述第三位置信息确定的检测框分别和所述多个第四位置信息确定的检测框之间的第二交并比,得到目标数量个第二交并比;Determining second intersection and union ratios between the detection frames determined by the third position information and the plurality of detection frames determined by the fourth position information, to obtain a target number of second intersection and union ratios;确定所述目标数量个第二交并比中大于目标阈值的第二交并比的第二个数;determining a second number of the target number of second intercrossing ratios that is greater than a target threshold;根据所述第二个数,对所述第一出现次数进行更新,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数。The first number of appearances is updated according to the second number to obtain a second number of appearances of the recording device in the target form between the initial video frame and the second video frame.7.根据权利要求6所述的方法,其特征在于,所述根据所述第二个数,对所述第一出现次数进行更新,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数,包括:7. The method according to claim 6, characterized in that, according to the second number, the first number of occurrences is updated to obtain the target form of the recording device in the initial video frame to a second number of occurrences between said second video frames, comprising:根据所述第二个数,对所述第一出现次数进行更新,得到参考次数;updating the first number of occurrences according to the second number to obtain a reference number;获取包含所述目标形态的摄录设备的检测框在参考视频画面中的参考位置信息,所述参考视频画面为在所述第二视频画面之前且与所述第二视频画面之间间隔目标数量个视频画面的视频画面;Obtain the reference position information of the detection frame of the recording device containing the target form in the reference video frame, the reference video frame is before the second video frame and separated from the second video frame by the target amount a video frame of video frames;确定所述参考位置信息确定的检测框和所述第三位置信息确定的检测框之间的第三交并比;determining a third intersection-over-union ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information;根据所述第三交并比,对所述参考次数进行调整,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数。According to the third cross-over-combination ratio, the reference number of times is adjusted to obtain a second number of appearances of the recording device in the target form between the initial video frame and the second video frame.8.根据权利要求7所述的方法,其特征在于,所述根据所述第二个数,对所述第一出现次数进行更新,得到参考次数,包括:8. The method according to claim 7, wherein said updating said first number of occurrences according to said second number to obtain a reference number comprises:基于所述第二个数大于个数阈值,在所述第一出现次数的基础上添加第三数值,得到参考次数;Based on the second number being greater than the number threshold, adding a third value to the first number of occurrences to obtain a reference number of times;基于所述第二个数不大于所述个数阈值,在所述第一出现次数的基础上添加第四数值,得到参考次数,所述第四数值小于所述第三数值。Based on the fact that the second number is not greater than the number threshold, a fourth value is added to the first number of occurrences to obtain a reference number, and the fourth value is smaller than the third value.9.根据权利要求7所述的方法,其特征在于,所述根据所述第三交并比,对所述参考次数进行调整,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数,包括:9. The method according to claim 7, characterized in that, according to the third cross-over-combination ratio, the reference number of times is adjusted, and the recording device of the target form is obtained from the initial video frame to the The second number of occurrences between the second video frames includes:基于所述第三交并比大于目标阈值,将所述参考次数作为所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数;Based on the third cross-over-combination ratio being greater than the target threshold, using the reference number of times as the second number of occurrences of the video recording device in the target form between the initial video frame and the second video frame;基于所述第三交并比不大于所述目标阈值,在所述参考次数的基础上添加第五数值,得到所述目标形态的摄录设备在所述初始视频画面到所述第二视频画面之间的第二出现次数。Based on the fact that the third cross-over-combination ratio is not greater than the target threshold, a fifth numerical value is added on the basis of the reference number of times to obtain the video recording device in the target form from the initial video frame to the second video frame The second number of occurrences between.10.根据权利要求1至3、5至9任一所述的方法,其特征在于,所述根据所述目标出现次数,确定所述视频的内容中是否包括摄录行为,包括:10. The method according to any one of claims 1 to 3, 5 to 9, wherein the determining whether the content of the video includes recording behavior according to the number of occurrences of the target comprises:基于所述目标出现次数大于次数阈值,且第三视频画面中包括所述目标形态的摄录设备和所述目标部位,获取在所述第三视频画面中包含所述目标形态的摄录设备的检测框的第五位置信息和包含所述目标部位的检测框的第六位置信息,所述第三视频画面为所述多帧视频画面中最后一个进行检测的视频画面;Based on the number of occurrences of the target being greater than the number threshold, and the third video picture includes the shooting device of the target form and the target part, acquiring the information of the shooting device including the target form in the third video picture The fifth position information of the detection frame and the sixth position information of the detection frame including the target part, the third video frame is the last video frame to be detected in the multi-frame video frames;根据所述第五位置信息和所述第六位置信息,确定所述第三视频画面中所述目标形态的摄录设备和所述目标部位的第四交并比;According to the fifth position information and the sixth position information, determine a fourth cross-over-merge ratio between the shooting device of the target form in the third video frame and the target part;基于所述第四交并比大于交并比阈值,确定所述视频的内容中包括摄录行为。Based on the fourth cross-combination ratio being greater than the cross-combination ratio threshold, it is determined that the content of the video includes a recording behavior.11.根据权利要求10所述的方法,其特征在于,所述基于所述第四交并比大于交并比阈值,确定所述视频的内容中包括摄录行为之后,所述方法还包括:11. The method according to claim 10, wherein after determining that the content of the video includes recording behavior based on the fourth cross-over-combination ratio being greater than the cross-over-combination ratio threshold, the method further comprises:在所述第三视频画面的第五位置信息指示的位置添加目标标记,所述目标标记用于指示所述第五位置信息指示的位置存在所述目标形态的摄录设备。Adding a target mark at the position indicated by the fifth position information of the third video frame, where the target mark is used to indicate that the recording device in the target form exists at the position indicated by the fifth position information.12.一种摄录检测装置,其特征在于,所述装置包括:12. A recording and detection device, characterized in that the device comprises:获取模块,用于获取待进行摄录检测的视频包括的连续的多帧视频画面;An acquisition module, configured to acquire continuous multi-frame video images included in the video to be recorded and detected;确定模块,用于在所述多帧视频画面中确定初始视频画面,所述初始视频画面为首次出现目标部位和目标形态的摄录设备的视频画面;A determining module, configured to determine an initial video frame in the multi-frame video frame, where the initial video frame is the video frame of the recording device where the target part and the target form appear for the first time;所述确定模块,还用于确定所述初始视频画面中包含所述目标形态的摄录设备的检测框和包含所述目标部位的检测框之间的目标交并比;The determination module is also used to determine the target intersection ratio between the detection frame of the recording device containing the target form and the detection frame containing the target part in the initial video frame;处理模块,用于基于所述目标交并比大于参考阈值,对所述多帧视频画面中除所述初始视频画面外的视频画面进行处理,得到所述目标形态的摄录设备在所述视频中的目标出现次数;A processing module, configured to process video frames in the multi-frame video frames except for the initial video frame based on the target intersection ratio being greater than a reference threshold, so as to obtain the The number of target occurrences in ;所述确定模块,还用于根据所述目标出现次数,确定所述视频的内容中是否包括摄录行为。The determination module is further configured to determine whether the content of the video includes video recording according to the number of occurrences of the target.13.一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以使所述电子设备实现如权利要求1至11任一所述的摄录检测方法。13. An electronic device, characterized in that the electronic device includes a processor and a memory, at least one program code is stored in the memory, and the at least one program code is loaded and executed by the processor, so that all Said electronic device implements the recording detection method according to any one of claims 1 to 11.14.一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以使计算机实现如权利要求1至11任一所述的摄录检测方法。14. A computer-readable storage medium, characterized in that at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor, so that the computer implements the method according to claim 1. The video recording detection method described in any one of 11 to 11.
CN202310142716.4A2023-02-082023-02-08Method, device and equipment for detecting video recording and computer readable storage mediumPendingCN116137071A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202310142716.4ACN116137071A (en)2023-02-082023-02-08Method, device and equipment for detecting video recording and computer readable storage medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202310142716.4ACN116137071A (en)2023-02-082023-02-08Method, device and equipment for detecting video recording and computer readable storage medium

Publications (1)

Publication NumberPublication Date
CN116137071Atrue CN116137071A (en)2023-05-19

Family

ID=86333452

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202310142716.4APendingCN116137071A (en)2023-02-082023-02-08Method, device and equipment for detecting video recording and computer readable storage medium

Country Status (1)

CountryLink
CN (1)CN116137071A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119229331A (en)*2023-06-292024-12-31中国石油天然气集团有限公司 Exploration well quality detection method, device, equipment and readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2013110718A (en)*2011-11-242013-06-06Kyocera CorpApparatus with camera function, program, and sneak photography prevention control method
CN111476059A (en)*2019-01-232020-07-31北京奇虎科技有限公司Target detection method and device, computer equipment and storage medium
CN111523347A (en)*2019-02-012020-08-11北京奇虎科技有限公司Image detection method and device, computer equipment and storage medium
CN111985331A (en)*2020-07-202020-11-24中电天奥有限公司Detection method and device for preventing secret of business from being stolen
CN112883755A (en)*2019-11-292021-06-01武汉科技大学Smoking and calling detection method based on deep learning and behavior prior
CN113408379A (en)*2021-06-042021-09-17开放智能机器(上海)有限公司Mobile phone candid behavior monitoring method and system
CN114067441A (en)*2022-01-142022-02-18合肥高维数据技术有限公司Shooting and recording behavior detection method and system
CN114143532A (en)*2020-09-042022-03-04华为技术有限公司 A method and device for diagnosing abnormality of a camera
CN115311742A (en)*2022-08-012022-11-08北京麦哲科技有限公司Target detection method and device
CN115601704A (en)*2022-10-282023-01-13广西电网有限责任公司(Cn)Method for recognizing photographing behavior under fixed monitoring

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
JP2013110718A (en)*2011-11-242013-06-06Kyocera CorpApparatus with camera function, program, and sneak photography prevention control method
CN111476059A (en)*2019-01-232020-07-31北京奇虎科技有限公司Target detection method and device, computer equipment and storage medium
CN111523347A (en)*2019-02-012020-08-11北京奇虎科技有限公司Image detection method and device, computer equipment and storage medium
CN112883755A (en)*2019-11-292021-06-01武汉科技大学Smoking and calling detection method based on deep learning and behavior prior
CN111985331A (en)*2020-07-202020-11-24中电天奥有限公司Detection method and device for preventing secret of business from being stolen
CN114143532A (en)*2020-09-042022-03-04华为技术有限公司 A method and device for diagnosing abnormality of a camera
CN113408379A (en)*2021-06-042021-09-17开放智能机器(上海)有限公司Mobile phone candid behavior monitoring method and system
CN114067441A (en)*2022-01-142022-02-18合肥高维数据技术有限公司Shooting and recording behavior detection method and system
CN115311742A (en)*2022-08-012022-11-08北京麦哲科技有限公司Target detection method and device
CN115601704A (en)*2022-10-282023-01-13广西电网有限责任公司(Cn)Method for recognizing photographing behavior under fixed monitoring

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LAPTEV VLADISLAV ET AL: "System for detecting dynamic objects on video sequence frames", 2022 INTERNATIONAL SIBERIAN CONFERENCE ON CONTROL AND COMMUNICATIONS (SIBCON), 4 January 2023 (2023-01-04)*
张文豪;吴怀宇;: "基于摄像头检测的防盗拍系统开发和算法研究", 电子设计工程, no. 18, 20 September 2013 (2013-09-20)*
王晓媛 等: "屏幕防窃拍方法综述", 计算机科学, vol. 2019, no. 1, 8 July 2019 (2019-07-08)*
蔡竞 等: "人脸识别公安实战应用教程", 30 September 2022, 成都西南交大出版社, pages: 111*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN119229331A (en)*2023-06-292024-12-31中国石油天然气集团有限公司 Exploration well quality detection method, device, equipment and readable storage medium

Similar Documents

PublicationPublication DateTitle
CN111754386B (en)Image area shielding method, device, equipment and storage medium
US11386586B2 (en)Method and electronic device for adding virtual item
CN112084811A (en)Identity information determining method and device and storage medium
CN109886208B (en)Object detection method and device, computer equipment and storage medium
CN112581358B (en)Training method of image processing model, image processing method and device
CN113627413B (en)Data labeling method, image comparison method and device
CN115497082A (en)Method, apparatus and storage medium for determining subtitles in video
CN113592874B (en)Image display method, device and computer equipment
CN111860064B (en)Video-based target detection method, device, equipment and storage medium
CN111586279B (en)Method, device and equipment for determining shooting state and storage medium
CN113407774B (en)Cover determination method, device, computer equipment and storage medium
CN112990424A (en)Method and device for training neural network model
CN113032590B (en)Special effect display method, device, computer equipment and computer readable storage medium
CN116137071A (en)Method, device and equipment for detecting video recording and computer readable storage medium
CN111639639B (en)Method, device, equipment and storage medium for detecting text area
CN114615520B (en)Subtitle positioning method, subtitle positioning device, computer equipment and medium
CN114594885B (en) Application icon management method, device, equipment and computer-readable storage medium
CN113590877B (en)Method and device for acquiring annotation data
CN118135255A (en)Training method of image matching model, image matching method and computer equipment
CN113129221B (en) Image processing method, device, equipment and storage medium
CN111723615B (en)Method and device for judging matching of detected objects in detected object image
CN111698453B (en)Video processing method and device
CN114140431A (en) Method, apparatus, device and computer-readable storage medium for authenticating images
CN113763486B (en)Dominant hue extraction method, device, electronic equipment and storage medium
CN112669291B (en)Picture processing method, device, equipment and computer readable storage medium

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp