Disclosure of Invention
The embodiment of the application provides a method, a device, equipment and a computer readable storage medium for detecting a video, which can be used for solving the problems of lower accuracy, higher cost and lower efficiency of video detection in the related technology. The technical scheme is as follows:
in one aspect, an embodiment of the present application provides a method for detecting a video recording, where the method includes:
acquiring continuous multi-frame video pictures included in a video to be subjected to shooting detection;
determining an initial video picture in the multi-frame video pictures, wherein the initial video picture is a video picture of a shooting device with a target part and a target form appearing for the first time;
determining a target cross-over ratio between a detection frame of the photographing device containing the target form and a detection frame containing the target part in the initial video picture;
Processing video pictures except the initial video picture in the multi-frame video pictures based on the target intersection ratio being larger than a reference threshold value to obtain the number of times of target occurrence of the shooting equipment in the target form in the video;
and determining whether the content of the video comprises a shooting behavior according to the occurrence times of the targets.
In one possible implementation manner, the processing the video frames of the multi-frame video frames except the initial video frame to obtain the number of times of target occurrence of the target form of the video capturing device in the video includes:
acquiring an initial matrix according to the target number frame video pictures in the multi-frame video pictures, wherein the initial matrix comprises a submatrix of the target number frame video pictures, the submatrix of any video picture is used for indicating the position information of a detection frame of the video camera equipment containing the target form in any video picture, and the target number frame video pictures are continuous target number frame video pictures starting from the initial video picture;
acquiring first position information of a detection frame of the video recording device containing the target form in a first video picture, wherein the first video picture is a video picture which continuously appears after the target number of frames of video pictures in the multi-frame video picture;
Acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix to obtain a plurality of second position information;
acquiring a first occurrence number of the photographing equipment of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information;
and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In one possible implementation manner, the obtaining, according to the first location information and the plurality of second location information, a first number of occurrences of the target form of the recording device from the initial video frame to the first video frame includes:
determining first cross ratios between the detection frames determined by the first position information and the detection frames determined by the plurality of second position information respectively to obtain a target number of first cross ratios;
determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold;
Based on the first number being greater than a number threshold, taking a first number as a first number of occurrences of the recording device of the target modality from the initial video frame to the first video frame;
and based on the first number not larger than the number threshold, taking a second numerical value as a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture, wherein the second numerical value is smaller than the first numerical value.
In one possible implementation manner, the determining, according to the initial matrix and the first occurrence number, a target occurrence number of the recording device in the target form in the video includes:
updating the initial matrix according to the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the first video picture;
acquiring third position information of a detection frame of the video recording equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to a plurality of submatrices included in the target matrix to obtain a plurality of fourth position information;
Updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture;
and traversing video pictures except the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In one possible implementation manner, the updating the initial matrix according to the sub-matrix of the first video picture to obtain the target matrix includes:
deleting the submatrices of the initial video pictures in the initial matrix to obtain a reference matrix;
and acquiring the target matrix according to the reference matrix and the submatrices of the first video picture.
In one possible implementation manner, the updating the first occurrence number according to the third location information and the plurality of fourth location information to obtain a second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes:
Determining second cross-over ratios between the detection frames determined by the third position information and the detection frames determined by the fourth position information respectively to obtain a target number of second cross-over ratios;
determining a second number of second cross ratios greater than a target threshold among the target number of second cross ratios;
and updating the first occurrence times according to the second number to obtain a second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In one possible implementation manner, the updating the first occurrence number according to the second number, to obtain a second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes:
updating the first occurrence times according to the second number to obtain reference times;
acquiring reference position information of a detection frame of the video recording device containing the target form in a reference video picture, wherein the reference video picture is a video picture which is in front of the second video picture and is separated from the second video picture by a target number of video pictures;
Determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information;
and according to the third cross-over ratio, the reference times are adjusted to obtain the second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In one possible implementation manner, the updating the first occurrence number according to the second number to obtain the reference number includes:
based on the second number being greater than a number threshold, adding a third numerical value on the basis of the first occurrence number to obtain a reference number;
and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain a reference number, wherein the fourth numerical value is smaller than the third numerical value.
In one possible implementation manner, the adjusting the reference number according to the third merging ratio to obtain a second number of occurrences of the target form of the recording device between the initial video frame and the second video frame includes:
based on the third intersection ratio being greater than a target threshold, taking the reference number of times as a second number of occurrences of the recording device of the target modality between the initial video picture and the second video picture;
And adding a fifth numerical value on the basis of the reference number based on the third intersection ratio not larger than the target threshold value, so as to obtain a second occurrence number of the photographing equipment in the target form from the initial video picture to the second video picture.
In one possible implementation manner, the determining whether the content of the video includes a recording behavior according to the target occurrence number includes:
based on the number of times of occurrence of the target is greater than a number of times threshold, and a third video picture comprises the target-form recording equipment and the target part, fifth position information of a detection frame of the target-form recording equipment and sixth position information of the detection frame of the target part are obtained in the third video picture, wherein the third video picture is the last video picture detected in the multi-frame video picture;
determining a fourth intersection ratio of the target form of the recording device and the target part in the third video picture according to the fifth position information and the sixth position information;
and determining that the content of the video comprises recording behavior based on the fourth cross-over ratio being greater than a cross-over ratio threshold.
In a possible implementation manner, after the determining that the content of the video includes the recording behavior based on the fourth blending ratio being greater than the blending ratio threshold, the method further includes:
and adding a target mark at a position indicated by fifth position information of the third video picture, wherein the target mark is used for indicating that the recording equipment with the target form exists at the position indicated by the fifth position information.
In another aspect, an embodiment of the present application provides a recording detection apparatus, including:
the acquisition module is used for acquiring continuous multi-frame video pictures included in the video to be subjected to the shooting detection;
the determining module is used for determining an initial video picture in the multi-frame video pictures, wherein the initial video picture is a video picture of a shooting device with a target part and a target form appearing for the first time;
the determining module is further configured to determine a target cross-over ratio between a detection frame of the recording device including the target form and a detection frame including the target portion in the initial video frame;
the processing module is used for processing video pictures except the initial video picture in the multi-frame video pictures based on the target intersection ratio being larger than a reference threshold value to obtain the number of times of target occurrence of the shooting equipment in the target form in the video;
The determining module is further configured to determine whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In a possible implementation manner, the processing module is configured to obtain an initial matrix according to a target number of frame video frames in the multi-frame video frames, where the initial matrix includes a sub-matrix of the target number of frame video frames, and the sub-matrix of any video frame is used to indicate position information of a detection frame of a recording device including the target form in the any video frame, and the target number of frame video frames is a continuous target number of frame video frames starting from the initial video frame;
acquiring first position information of a detection frame of the video recording device containing the target form in a first video picture, wherein the first video picture is a video picture which continuously appears after the target number of frames of video pictures in the multi-frame video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix to obtain a plurality of second position information;
Acquiring a first occurrence number of the photographing equipment of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information;
and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In a possible implementation manner, the processing module is configured to determine first cross-ratios between the detection frames determined by the first location information and the detection frames determined by the plurality of second location information, so as to obtain a target number of first cross-ratios;
determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold;
based on the first number being greater than a number threshold, taking a first number as a first number of occurrences of the recording device of the target modality from the initial video frame to the first video frame;
and based on the first number not larger than the number threshold, taking a second numerical value as a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture, wherein the second numerical value is smaller than the first numerical value.
In a possible implementation manner, the processing module is configured to update the initial matrix according to a sub-matrix of the first video picture to obtain a target matrix, where the target matrix includes the sub-matrix of the first video picture;
acquiring third position information of a detection frame of the video recording equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture;
acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to a plurality of submatrices included in the target matrix to obtain a plurality of fourth position information;
updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture;
and traversing video pictures except the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In a possible implementation manner, the processing module is configured to delete a sub-matrix of an initial video picture in the initial matrix to obtain a reference matrix;
and acquiring the target matrix according to the reference matrix and the submatrices of the first video picture.
In a possible implementation manner, the processing module is configured to determine second cross-over ratios between the detection frames determined by the third location information and the detection frames determined by the plurality of fourth location information, to obtain a target number of second cross-over ratios;
determining a second number of second cross ratios greater than a target threshold among the target number of second cross ratios;
and updating the first occurrence times according to the second number to obtain a second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In a possible implementation manner, the processing module is configured to update the first occurrence number according to the second number to obtain a reference number;
acquiring reference position information of a detection frame of the video recording device containing the target form in a reference video picture, wherein the reference video picture is a video picture which is in front of the second video picture and is separated from the second video picture by a target number of video pictures;
Determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information;
and according to the third cross-over ratio, the reference times are adjusted to obtain the second occurrence times of the photographing equipment of the target form from the initial video picture to the second video picture.
In a possible implementation manner, the processing module is configured to add a third value based on the first occurrence number based on the second number being greater than a number threshold value, to obtain a reference number;
and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain a reference number, wherein the fourth numerical value is smaller than the third numerical value.
In a possible implementation manner, the processing module is configured to use the reference number as a second number of occurrences of the recording device in the target form between the initial video frame and the second video frame based on the third blending ratio being greater than a target threshold;
and adding a fifth numerical value on the basis of the reference number based on the third intersection ratio not larger than the target threshold value, so as to obtain a second occurrence number of the photographing equipment in the target form from the initial video picture to the second video picture.
In a possible implementation manner, the determining module is configured to obtain, based on the number of occurrences of the target being greater than a number threshold, fifth location information of a detection frame of the recording device in the target form and sixth location information of a detection frame of the target location in a third video frame, where the third video frame is a last video frame detected in the multi-frame video frame, where the third video frame includes the recording device in the target form and the target location;
determining a fourth intersection ratio of the target form of the recording device and the target part in the third video picture according to the fifth position information and the sixth position information;
and determining that the content of the video comprises recording behavior based on the fourth cross-over ratio being greater than a cross-over ratio threshold.
In one possible implementation, the apparatus further includes:
and the adding module is used for adding a target mark at the position indicated by the fifth position information of the third video picture, wherein the target mark is used for indicating that the recording equipment with the target form exists at the position indicated by the fifth position information.
In another aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a processor and a memory, where the memory stores at least one program code, and the at least one program code is loaded and executed by the processor, so that the electronic device implements any one of the above-mentioned recording detection methods.
In another aspect, there is provided a computer readable storage medium having at least one program code stored therein, the at least one program code being loaded and executed by a processor to cause a computer to implement any of the above-described methods of detecting a video recording.
In another aspect, there is also provided a computer program or computer program product having stored therein at least one computer instruction that is loaded and executed by a processor to cause a computer to implement any of the above-described methods of recording detection.
The technical scheme provided by the embodiment of the application at least brings the following beneficial effects:
according to the technical scheme, the initial video picture is determined in the continuous multi-frame video pictures included in the video, when the target cross ratio between the detection frame of the video equipment with the target form and the detection frame with the target position included in the initial video picture is larger than the reference threshold, the multi-frame video pictures included in the video are tracked, so that the number of times of target occurrence of the video equipment with the target form in the video is obtained, and whether the content of the video includes the shooting behavior is determined according to the number of times of target occurrence. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
Fig. 1 is a schematic view of an implementation environment of a method for detecting a video recording according to an embodiment of the present application, as shown in fig. 1, where the implementation environment includes: anelectronic device 101. Theelectronic device 101 may be a terminal device or a server, which is not limited in the embodiment of the present application. Theelectronic device 101 is configured to execute the recording detection method provided in the embodiment of the present application.
Optionally, based on theelectronic device 101 being a terminal device, the terminal device is any electronic device product that can perform man-machine interaction with a user through one or more manners such as a keyboard, a touch pad, a touch screen, a remote controller, a voice interaction or a handwriting device, for example, a PC (Personal Computer ), a mobile phone, a smart phone, a PDA (Personal Digital Assistant, a personal digital assistant), a wearable device, a PPC (Pocket PC, palm computer), a tablet computer, a smart car machine, a smart television, a smart speaker, and the like. Based on theelectronic device 101 as a server, the server may be a server, a server cluster formed by a plurality of server units, or a cloud computing service center. The terminal device establishes communication connection with the server through a wired network or a wireless network.
It will be appreciated by those skilled in the art that the above described terminal devices and servers are merely illustrative, and that other terminal devices or servers, now existing or hereafter may be present, are intended to be within the scope of the present application, and are incorporated herein by reference.
The embodiment of the present application provides a method for detecting a video recording, which can be applied to the above-mentioned implementation environment, taking a flowchart of a method for detecting a video recording, which is shown in fig. 2 and provided in the embodiment of the present application, as an example, the method can be executed by theelectronic device 101 in fig. 1. As shown in fig. 2, the method includes the followingsteps 201 to 205.
Instep 201, a continuous multi-frame video frame included in a video to be subjected to a recording detection is acquired.
In one possible implementation, a storage space of the electronic device stores a plurality of continuous frames of video frames included in the video to be subjected to the recording detection, and a storage space of the electronic device obtains a plurality of continuous frames of video frames included in the video to be subjected to the recording detection. The video to be subjected to the recording detection may be a video that has already been recorded or may be a video that is being recorded, which is not limited in the embodiment of the present application.
Or the electronic equipment acquires the video to be subjected to the video shooting detection, and carries out framing treatment on the video to be subjected to the video shooting detection to obtain continuous multi-frame video pictures included in the video to be subjected to the video shooting detection. The manner of acquiring the video to be subjected to the recording detection includes, but is not limited to, the following four types.
In the first mode, a plurality of candidate videos are stored in the electronic device, and any one candidate video is used as a video to be subjected to video recording detection.
And in the second mode, the electronic equipment is terminal equipment, a plurality of candidate videos are stored in the terminal equipment, and the selected candidate videos are used as videos to be subjected to video shooting detection.
In a third mode, the electronic device is a server, the server and the terminal device are in communication connection through a wired network or a wireless network, a plurality of candidate videos are stored in the terminal device, the terminal device takes the selected candidate videos as videos to be subjected to video shooting detection, and the terminal device sends the videos to be subjected to video shooting detection to the server so that the server can acquire the videos to be subjected to video shooting detection.
In a fourth mode, the electronic device is a terminal device, and a first application program for video acquisition is installed and operated in the terminal device. The terminal equipment calls a first application program to collect videos, and the collected videos are used as videos to be subjected to video recording detection.
The first application may be any application capable of capturing video, which is not limited in this embodiment of the present application. Illustratively, the first application is a camera. The acquired video may be a video that has been acquired, or may be a video that is being acquired, which is not limited in this embodiment.
It should be noted that any of the above methods may be selected to obtain the video to be subjected to the recording detection, which is not limited in the embodiment of the present application.
In one possible implementation manner, the second application program for video framing is installed and running in the electronic device, and the second application program may be any program capable of performing video framing operation, which is not limited in the embodiment of the present application. Illustratively, the second application is video editing software (Premiere Pro, PR). After the electronic equipment acquires the video to be subjected to the video shooting detection, a second application program is called to carry out framing processing on the video to be subjected to the video shooting detection, and continuous multi-frame video pictures included in the video are obtained. Illustratively, the video to be subjected to the camcorder detection is subjected to framing processing, resulting in continuous 7-frame video pictures included in the video.
Instep 202, an initial video frame, which is a video frame of a recording apparatus in which a target portion and a target form appear for the first time, is determined from among a plurality of frame video frames.
The target part is a hand, and the target form of the photographing device refers to that the photographing device faces the lens in the front direction, namely, the image of the front and back sides of the photographing device is included in the initial video picture. The image pickup apparatus is any apparatus capable of performing image pickup or image recording, and the embodiment of the present application is not limited thereto. Illustratively, the recording device is a cell phone, or the recording device is a camera. For example, if the recording device is a mobile phone, the recording device in the target form refers to the front facing lens of the mobile phone, that is, the video picture includes an image of the front and back sides of the mobile phone.
In one possible implementation, the process of determining an initial video picture in a plurality of frames of video pictures includes: detecting each frame of video picture to obtain video content included in each frame of video picture; determining video frames of which the video content comprises a shooting device and a target part in a plurality of frames of video frames; classifying the video content including the video equipment and the video equipment included in the video picture of the target part to obtain the form of the video content including the video equipment and the video equipment included in the video picture of the target part; the video picture whose video content includes a recording device and a target portion includes a video picture whose form of the recording device is a target form and whose appearance time is earliest is taken as an initial video picture.
And calling a target detection network to detect each frame of video picture so as to obtain video content included in each frame of video picture. And calling a target classification network to classify the video content including the video equipment and the video equipment included in the video picture of the target part, so as to obtain the form of the video content including the video equipment and the video equipment included in the video picture of the target part.
The target detection network needs to be acquired before it is invoked. The process of acquiring the target detection network comprises the following steps: acquiring a first training data set and an initial detection network, wherein the first training data set comprises a first image and image contents contained in the first image, the first image comprises an image of a photographing device and an image of a target part, and the image contents contained in the first image comprise the photographing device and the target part; the initial detection network is any network capable of content detection, and is illustratively YOLO (You Only Look Once, a detection network). Training the initial detection network according to the first training data set to obtain a target detection network.
Optionally, training the initial detection network according to the first training data set, and the process of obtaining the target detection network includes: and carrying out data enhancement on the first image in the first training data set to obtain a first training data set after data enhancement, and training the initial detection network according to the first training data set after data enhancement to obtain a target detection network. The data enhancement mode of the first image includes, but is not limited to, random scaling, data normalization, image stitching and the like.
The target classification network needs to be acquired before it is invoked. The process of acquiring the target classification network comprises the following steps: a second training dataset and an initial classification network are acquired, the second training dataset comprising a second image, a morphology of the second image, a third image, and a morphology of the third image. The second image and the third image are images of the photographing device, the second image is in a form of a front back surface, and the third image is in a form of a non-front back surface. The initial classification network is any network capable of content classification, and is illustratively a residual neural network (Resnet). Training the initial classification network according to the second training data set to obtain a target classification network. Optionally, according to the second training data set, the parameters of the initial classification network are updated by adopting a learning strategy of pre-training weight combined with fine-tuning (fine-tune), so as to obtain the target classification network.
Optionally, training the initial classification network according to the second training data set, and the process of obtaining the target classification network includes: and when the number of the second images and the third images included in the second training data set is unbalanced, extracting the second images and the third images from the second training data set in a random sampling mode, wherein the number of the extracted second images and the number of the extracted third images are the same. Training the initial classification network according to the extracted second image, the extracted form of the third image and the extracted form of the third image to obtain a target classification network. And the extracted second image and the extracted third image can be subjected to data enhancement, and the initial classification network is trained according to the image after data enhancement, the form of the second image and the form of the third image, so that the target classification network is obtained. The data enhancement mode of the extracted second image and the extracted third image includes, but is not limited to, random clipping, random horizontal flipping, random vertical flipping, random Gaussian blur, scaling, left-right or up-down random zero padding and the like.
Instep 203, a target cross-over ratio between a detection frame of the recording apparatus including the target form and a detection frame including the target portion in the initial video picture is determined.
Optionally, the determining the target cross-over ratio between the detection frame of the recording device including the target form and the detection frame including the target portion in the initial video frame includes: if a coincidence area exists between a detection frame of the camera equipment containing the target form in the initial video picture and a detection frame containing the target position in the initial video picture, determining a first area of the coincidence area between the detection frame of the camera equipment containing the target form in the initial video picture and the detection frame containing the target position in the initial video picture; determining a second area of a graph formed by a detection frame of the camera equipment containing the target form in the initial video picture and a detection frame containing the target part in the initial video picture; and taking the ratio between the first area and the second area as the target cross-over ratio between a detection frame of the camera equipment containing the target form and a detection frame containing the target part in the initial video picture. If there is no overlapping area between the detection frame of the photographing apparatus containing the target form in the initial video picture and the detection frame containing the target part in the initial video picture, 0 is taken as the target overlap ratio between the detection frame of the photographing apparatus containing the target form and the detection frame containing the target part in the initial video picture.
Fig. 3 is a schematic diagram of a detection frame of a recording device including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to an embodiment of the present application. Wherein 301 is a detection frame of a recording device including a target form in an initial video frame, and 302 is a detection frame including a target portion in the initial video frame. Since there is a coincidence region between 301 and 302, the hatched region in fig. 3 is a coincidence region; therefore, the ratio of the area of the shadow area to the area of the pattern formed by the detection frame of the photographing apparatus including the target form in the initial video picture and the detection frame including the target portion in the initial video picture is taken as the target cross-over ratio between the detection frame of the photographing apparatus including the target form in the initial video picture and the detection frame including the target portion.
Fig. 4 is a schematic diagram of a detection frame of a recording apparatus including a target form in an initial video frame and a detection frame including a target portion in the initial video frame according to another embodiment of the present application. Wherein 401 is a detection frame of a recording device including a target form in an initial video frame, and 402 is a detection frame including a target portion in the initial video frame. Since there is no overlapping area between 401 and 402, 0 is taken as the target overlap ratio between the detection frame of the recording apparatus including the target form and the detection frame including the target portion in the initial video picture.
Instep 204, based on the target overlap ratio being greater than the reference threshold, the video frames of the multi-frame video frames except the initial video frame are processed to obtain the number of times of target occurrence of the target-shaped video recording device in the video.
In one possible implementation, the reference threshold is set empirically or adjusted according to the implementation environment, which is not limited by the embodiments of the present application. The process for processing the video pictures except the initial video picture in the multi-frame video pictures to obtain the number of times of target occurrence of the shooting equipment in the target form in the video comprises the following steps: acquiring an initial matrix according to the target number of frames of video pictures in a plurality of frames of video pictures; acquiring a first occurrence number of the shooting equipment in the target form from the initial video picture to the first video picture according to the first video picture and the initial matrix; and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
The target number is set based on experience, or is adjusted according to the total frame number of the video frames included in the video, which is not limited in the embodiment of the present application. The target number is greater than zero and less than the total number of frames of the video frames that the video includes. The initial matrix comprises a submatrix of the target number of frames of video pictures, and the submatrix of any video picture is used for indicating the position information of a detection frame of the video equipment containing the target form in any video picture. The target number of frames of video is a continuous target number of frames of video starting with the initial video. The first video picture is a video picture that continuously appears after the target number of frames of video pictures in the multi-frame video picture.
Illustratively, the target number is 3, the initial video frame is the first frame video frame, the target number is the first frame video frame, the second frame video frame, and the third frame video frame, and the first video frame is the fourth frame video frame. For another example, if the target number is 3 and the initial video frame is the fourth frame video frame, the target number is the fourth frame video frame, the fifth frame video frame, and the sixth frame video frame, and the first video frame is the seventh frame video frame.
In one possible implementation, the process of obtaining the initial matrix according to a target number of frames of video frames of the multi-frame video frames includes: processing the video pictures of the target number of frames to obtain sub-matrixes respectively corresponding to the video pictures of the target number of frames; and obtaining an initial matrix according to the sub-matrixes respectively corresponding to the target number of frames of video pictures. Optionally, stacking the sub-matrixes respectively corresponding to the video frames of the target number to obtain an initial matrix. The initial matrix records the video pictures of the target number of frames, and provides a basis for the subsequent determination of the occurrence times of the photographing equipment.
Optionally, the process of processing the target number of frames of video frames to obtain the sub-matrix respectively corresponding to the target number of frames of video frames includes: for any frame of video pictures in the target number of frames of video pictures, based on the fact that any frame of video pictures comprises a shooting device, the shape of the shooting device is the target shape, and a matrix corresponding to the position information of a detection frame of the shooting device containing the target shape is used as a sub-matrix corresponding to any frame of video pictures; the first matrix is used as a sub-matrix corresponding to any frame of video picture based on the fact that any frame of video picture comprises the shooting equipment and the form of the shooting equipment is not the target form, or based on the fact that any frame of video picture does not comprise the shooting equipment.
The first matrix is set based on experience, or adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the first matrix is [0,0]. The matrix corresponding to the position information of the detection frame of the recording apparatus including the target form is a matrix composed of the position information of the detection frame of the recording apparatus including the target form. The positional information of the detection frame including the target form of the recording apparatus may be positional information of the upper left corner and positional information of the lower right corner of the detection frame including the target form of the recording apparatus, positional information of the upper right corner and positional information of the lower left corner of the detection frame including the target form of the recording apparatus, or positional information of the center point of the detection frame including the target form of the recording apparatus, and the length and width of the detection frame, which are not limited in the embodiment of the present application. For example, the position information of the upper left corner of the detection frame of the recording apparatus including the target form is (X1 ,Y1 ) The position information of the lower right corner is (X2 ,Y2 ) The matrix composed of the position information of the detection frame of the recording device with the target form is [ X ]1 ,Y1 ,X2 ,Y2 ]。
In one possible implementation manner, for any frame of video frames in the target number of frame of video frames, detecting any frame of video frames to obtain video content included in any frame of video frames; and classifying the video equipment included in any frame of video picture based on the video content included in any frame of video picture to obtain the form of the video equipment included in any frame of video picture. This procedure is similar to the procedure of determining the initial video frame instep 202, and will not be described here.
Optionally, the process of obtaining the first occurrence number of the shooting device of the target form from the first video picture to the first video picture according to the first video picture and the initial matrix includes: acquiring first position information of a detection frame of a photographing device containing a target form in a first video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the initial matrix, and acquiring a plurality of second position information; and acquiring a first occurrence number of the photographing device of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information.
The process of acquiring the first position information of the detection frame of the video recording device containing the target form in the first video picture comprises the following steps: the position information of a detection frame of a recording apparatus including a target form included in a first video picture is taken as first position information. For example, the position information of the upper left corner of the detection frame of the recording apparatus including the target form included in the first video picture is (X1 ,Y1 ) The position information of the lower right corner is (X2 ,Y2 ) Will (X)1 ,Y1 ) And (X)2 ,Y2 ) As the first position information.
According to a plurality of submatrices included in the initial matrix, acquiring the position information of a detection frame of the photographing device containing the target form in the video picture corresponding to each submatrix, and acquiring a plurality of second position information comprises the following steps: and taking the position information corresponding to any one of the submatrices in the initial matrix as second position information. For example, one of the submatrices in the initial matrix is [ X ]3 ,Y3 ,X4 ,Y4 ]Will (X)3 ,Y3 ) And (X)4 ,Y4 ) And second position information of a detection frame of the recording device containing the target form in a video picture corresponding to the submatrix.
Optionally, the process of obtaining the first occurrence number of the shooting device of the target form from the initial video picture to the first video picture according to the first position information and the plurality of second position information includes: determining first cross ratios between the detection frames determined by the first position information and the detection frames determined by the second position information respectively to obtain a target number of first cross ratios; determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold; based on the first number being greater than a number threshold, taking the first number as a first number of occurrences of the recording device in the target modality from the initial video frame to the first video frame; and taking the second numerical value as a first occurrence number of the photographing device between the initial video picture and the first video picture in a target form based on the first number not larger than a number threshold, wherein the second numerical value is smaller than the first numerical value.
The target threshold, the number threshold, the first value and the second value are all set based on experience, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Optionally, the number threshold is determined based on the frequency of occurrence and the target number, e.g., the number threshold is the product of the frequency of occurrence and the target number. The occurrence frequency is a measure of occurrence frequency of the video recording device in the target form in the continuous frames in the video. The target threshold is used to measure the magnitude of movement of the recording device that allows the target modality. The smaller the target threshold, the larger the movement amplitude of the recording apparatus of the target modality is allowed, whereas the larger the target threshold, the smaller the movement amplitude of the recording apparatus of the target modality is allowed. Illustratively, the target threshold is 80%, the number threshold is 2, the first value is 1, and the second value is 0.
The process of determining a first cross-over ratio between the detection frames determined by the first location information and the detection frames determined by the plurality of second location information, respectively, includes: for any one of the plurality of second position information, if a coincidence region exists between the detection frame determined by the first position information and the detection frame determined by any one of the second position information, determining a third area of the coincidence region between the detection frame determined by the first position information and the detection frame determined by any one of the second position information; determining a fourth area of a graph formed by the detection frame determined by the first position information and any one of the detection frames determined by the second position information; the ratio between the third area and the fourth area is used as a first cross ratio between the detection frame determined by the first position information and the detection frame determined by any one of the second position information. If there is no overlapping area between the detection frame determined by the first position information and the detection frame determined by any one of the second position information, 0 is taken as a first overlapping ratio between the detection frame determined by the first position information and the detection frame determined by any one of the second position information.
According to the method for determining the first occurrence number, on one hand, through judging the cross-over ratio of the shooting equipment between the first video frames and the target number of frames of video frames, the spatial characteristics of shooting behaviors are simulated, namely the amplitude of shake is small while the shooting equipment is kept to be aligned to a camera in the forward direction, on the other hand, the occurrence number of the shooting equipment is calculated through the combination of the first video frames and the target number of frames of video frames, and the time characteristics of videos are utilized, so that the determination of the first occurrence number is more accurate. In addition, the fault tolerance of detecting the loss of the video recording device is increased by the judging method of combining the information of the first video picture and the target number of frames of video pictures.
In one possible implementation, the determining, according to the initial matrix and the first occurrence number, a target occurrence number of the video capturing device in the target form includes: updating the initial matrix according to the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the first video picture; updating the first occurrence number according to the second video picture and the target matrix to obtain a second occurrence number of the shooting equipment in the target form from the initial video picture to the second video picture; traversing video pictures except for the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
The second video picture is a video picture which is behind and adjacent to the first video picture. Illustratively, the first video picture is a fourth frame video picture, and the second video picture is a fifth frame video picture.
Before updating the initial matrix according to the sub-matrix of the first video frame, the sub-matrix of the first video frame needs to be acquired, and the acquisition process of the sub-matrix of the first video frame is similar to the acquisition process of the sub-matrix corresponding to the video frames of the target number in the above steps, and will not be described in detail.
Optionally, updating the initial matrix according to the sub-matrix of the first video frame, and the process of obtaining the target matrix includes: deleting the submatrices of the initial video pictures in the initial matrix to obtain a reference matrix; and acquiring a target matrix according to the reference matrix and the submatrices of the first video picture. For example, the reference matrix and the sub-matrix of the first video picture are stacked to obtain the target matrix.
Illustratively, the initial matrix includes a sub-matrix of the first frame video picture, the second frame video picture and the third frame video picture, and the first frame video picture is the initial video picture, so that the sub-matrix of the first frame video picture in the initial matrix is deleted to obtain a reference matrix, and the reference matrix includes the sub-matrix of the second frame video picture and the sub-matrix of the third frame video picture. And stacking the reference matrix and the submatrices of the first video picture to obtain a target matrix, wherein the target matrix comprises the submatrices of the second frame of video picture, the submatrices of the third frame of video picture and the submatrices of the fourth frame of video picture (namely the submatrices of the first video picture).
Optionally, the updating the first occurrence number according to the second video frame and the target matrix to obtain a second occurrence number of the photographing device in the target form between the initial video frame and the second video frame includes: acquiring third position information of a detection frame of the photographing equipment containing the target form in a second video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the target matrix to obtain a plurality of fourth position information; and updating the first appearance times according to the third position information and the plurality of fourth position information to obtain the second appearance times of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, the process of acquiring the third position information is similar to the process of acquiring the first position information in the above step, and the process of acquiring the fourth position information is similar to the process of acquiring the second position information in the above step, which is not described herein.
Optionally, the updating the first occurrence number according to the third position information and the plurality of fourth position information to obtain the second occurrence number of the target form of the recording device between the initial video frame and the second video frame includes: determining second cross-over ratios between the detection frames determined by the third position information and the detection frames determined by the fourth position information respectively to obtain a target number of second cross-over ratios; determining a second number of second cross ratios of the target number of second cross ratios greater than the target threshold; and updating the first occurrence number according to the second number to obtain a second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture.
The process of determining the second cross-over ratio between the detection frame determined by the third position information and the detection frames determined by the fourth position information is similar to the process of determining the first cross-over ratio between the detection frame determined by the first position information and the detection frames determined by the second position information in the above steps, and will not be described herein.
In the embodiment of the present application, the determining manner of the second occurrence number of the video recording device, which obtains the target form, between the initial video frame and the second video frame is not limited by updating the first occurrence number according to the second number. Optionally, according to the second number, the first occurrence number is updated by the following two implementations, so as to obtain a second occurrence number of the target-form recording device between the initial video frame and the second video frame.
According to the first implementation mode, based on the fact that the second number is larger than the number threshold, a third numerical value is added on the basis of the first occurrence number, and the second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture is obtained. And adding a reference value on the basis of the first occurrence number based on the second number not larger than the number threshold value to obtain a second occurrence number of the photographing equipment in the target form between the initial video picture and the second video picture.
The third value and the reference value are set based on experience, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. The reference value is less than the third value. Illustratively, the third value is 1 and the reference value is-1.
The second implementation mode is that the first occurrence times are updated according to the second number to obtain reference times, and reference position information of a detection frame of the shooting and recording equipment with a target form in a reference video picture is obtained, wherein the reference video picture is a video picture of a target number of video pictures before and at intervals from a second video picture; determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information; and according to the third cross-over ratio, adjusting the reference times to obtain the second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture.
Optionally, the target number of frames of video frames is a fourth frame of video frames, a fifth frame of video frames and a sixth frame of video frames, the first video frame is a seventh frame of video frames, the second video frame is an eighth frame of video frames, and the reference video frame is the fifth frame of video frames.
Updating the first occurrence number according to the second number, wherein the process of obtaining the reference number comprises the following steps: based on the second number being greater than the number threshold, adding a third numerical value on the basis of the first occurrence number to obtain a reference number; and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain the reference number, wherein the fourth numerical value is smaller than the third numerical value. The third value and the fourth value are set empirically or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the third value is 1 and the fourth value is 0.
The process of determining the third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information is similar to the process of determining the first intersection ratio between the detection frame determined by the first position information and the detection frames determined by the plurality of second position information in the above steps, and will not be described herein.
Optionally, the process of adjusting the reference number of times according to the third cross-over ratio to obtain the second number of occurrences of the target form of the recording device between the initial video frame and the second video frame includes: taking the reference number as a second occurrence number of the photographing device in the target form between the initial video picture and the second video picture based on the third intersection ratio being larger than the target threshold; and adding a fifth numerical value based on the reference times based on the third intersection ratio not larger than the target threshold value, and obtaining a second occurrence time of the photographing equipment in the target form between the initial video picture and the second video picture.
The target threshold and the fifth value are set empirically, or are adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the target threshold is 80% and the fifth value is-1.
It should be noted that, the second occurrence number of the photographing device in the target form between the initial video frame and the second video frame may be obtained by any implementation manner, which is not limited in this embodiment of the present application.
It should be further noted that, a process of determining the number of times of occurrence of the target in the video by the recording device in the target form is similar to a process of determining the number of times of occurrence of the second between the initial video frame and the second video frame by the recording device in the target form, which is not described in detail in the embodiment of the present application.
Instep 205, it is determined whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In one possible implementation, according to the number of occurrences of the target, there are two implementations of determining whether the content of the video includes a recording behavior.
The first implementation mode is to determine that the content of the video comprises a shooting behavior based on the fact that the number of occurrence times of the target is larger than a number threshold; and determining that the content of the video does not comprise the recording behavior based on the target occurrence number not being greater than the number threshold.
The frequency threshold is set based on experience, or is adjusted according to the implementation environment, which is not limited in the embodiment of the present application. Illustratively, the number of times threshold is 3.
The second implementation mode is based on the fact that the number of times of occurrence of the target is larger than a number threshold, and the third video picture comprises a target-form shooting device and a target part, and fifth position information of a detection frame of the target-form shooting device and sixth position information of the detection frame of the target part are obtained in the third video picture; determining a fourth intersection ratio of the recording device of the target form and the target part in the third video picture according to the fifth position information and the sixth position information; and determining that the content of the video includes recording behavior based on the fourth intersection ratio being greater than the intersection ratio threshold.
The third video picture is the last video picture to be detected in the multi-frame video pictures. The process of determining the fourth blending ratio of the target-form capturing device and the target portion in the third video frame according to the fifth position information and the sixth position information is similar to the process of determining the first blending ratio in the above steps, and will not be described herein. A fourth intersection ratio of the recording apparatus of the target modality and the target site in the third video picture indicates a relative relationship of the recording apparatus of the target modality and the target site in the third video picture. The cross ratio threshold is set empirically or adjusted according to the implementation environment, and this is not limited in the embodiment of the present application, and is exemplified by 50%.
In one possible implementation, it is determined that the recording behavior is not included in the content of the video based on the fourth cross-ratio not being greater than the cross-ratio threshold. And acquiring seventh position information of a detection frame of the photographing device containing the target form in the candidate video picture and eighth position information of the detection frame containing the target position in the candidate video picture based on the fact that the number of times of occurrence of the target is larger than a number threshold and at least one of the photographing device and the target position of the photographing device containing the target form is not included in the third video picture. And determining a fifth intersection ratio of the video capturing device of the target form and the target position in the candidate video frame according to the seventh position information and the eighth position information, and determining that the content of the video includes capturing behavior based on the fifth intersection ratio being greater than an intersection ratio threshold. And determining that the video content does not include the recording behavior based on the fifth cross-over ratio not being greater than the cross-over ratio threshold. The candidate video frames are video frames which are detected among the third video frames, are nearest to the third video frames, and comprise a shooting device with a target form and a target part.
The process of determining the fifth merging ratio of the recording device of the target form and the target position in the candidate video frame according to the seventh position information and the eighth position information is similar to the process of determining the first merging ratio in the above steps, and will not be described herein.
In one possible implementation, the video-based content includes a recording behavior, and the third video frame includes a recording device in a target form and a target location, and a target mark is added at a position indicated by the fifth position information of the third video frame, where the target mark is used to indicate that the recording device in the target form exists at the position indicated by the fifth position information. The target mark may be any mark, which is not limited in this embodiment, and is exemplified by a red dot.
Alternatively, a reference mark is added to the position indicated by the seventh position information of the candidate video picture, the reference mark being used for indicating that the recording device of the target form exists at the position indicated by the seventh position information, based on at least one of the recording device of the target form and the target site not included in the third video picture, and the recording device of the target form is included in the content of the video. The reference sign may be any sign, which is not limited in the embodiment of the present application. Illustratively, the reference marks are green dots.
Optionally, based on the electronic device being a terminal device, after the terminal device adds the target mark at the position indicated by the fifth position information of the third video frame, the third video frame to which the target mark is added may be displayed, so that the user knows that the recording device with the target form exists at the position where the target mark is located. Or based on the electronic device as a server, the server and the terminal device are in communication connection through a wired network or a wireless network, after the server adds the target mark at the position indicated by the fifth position information of the third video image, the server sends the third video image added with the target mark to the terminal device, and the terminal device receives the third video image sent by the server and added with the target mark and displays the third video image added with the target mark, so that a user knows that a shooting device with a target form exists at the position where the target mark is located.
Optionally, based on the electronic device being a terminal device, after the terminal device adds the reference mark at the position indicated by the seventh position information of the candidate video frame, the candidate video frame to which the reference mark is added may be displayed, so that the user knows that the recording device of the target form exists at the position where the reference mark is located. Or based on the electronic device being a server, the server and the terminal device are in communication connection through a wired network or a wireless network, after the server adds the reference mark at the position indicated by the seventh position information of the candidate video images, the server sends the candidate video images added with the reference mark to the terminal device, and the terminal device receives the candidate video images added with the reference mark sent by the server and displays the candidate video images added with the reference mark so that a user knows that a recording device with a target form exists at the position of the reference mark.
According to the method, an initial video picture is determined in continuous multi-frame video pictures included in the video, when the target intersection ratio between a detection frame of the video camera with the target form and a detection frame with the target position included in the initial video picture is larger than a reference threshold value, the multi-frame video pictures included in the video are tracked, so that the number of times of target occurrence of the video camera with the target form in the video is obtained, and whether the video content includes the shooting behavior is determined according to the number of times of target occurrence. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
Fig. 5 is a flowchart of a method for detecting a video recording according to an embodiment of the present application, where the method includes the following steps.
Step 501, a continuous multi-frame video picture included in a video to be subjected to video recording detection is acquired.
In a possible implementation manner, the process of acquiring the continuous multi-frame video frames included in the video to be subjected to the recording detection is described in theabove step 201, and will not be described herein.
Step 502, a target detection network is called to process each frame of video picture, and video content included in each frame of video picture is obtained.
In a possible implementation manner, the process of acquiring the video content included in each frame of video frame is described in theabove step 202, which is not described herein.
Step 503, determining a video picture of which the video content comprises a recording device and a target part in a plurality of frames of video pictures.
In a possible implementation manner, the process of determining that the video content includes the video frame of the recording device and the target portion in the multi-frame video frame is described in theabove step 202, which is not described herein.
Step 504, calling a target classification network to classify the video image including the video image capturing device and the target part, and obtaining the form of the video image capturing device including the video image capturing device and the target part.
In a possible implementation manner, the process of determining the form of the recording apparatus included in the video frame including the recording apparatus and the target portion is described in theabove step 202, and will not be described herein.
Step 505, determining an initial video frame according to the form of the recording device included in the video frame including the recording device and the target portion.
In a possible implementation manner, the process of determining the initial video frame is described in theabove step 202, and will not be described herein.
Step 506, determining a target cross-over ratio between a detection frame of the photographing device containing the target form and a detection frame containing the target part in the initial video picture.
In one possible implementation manner, the process of determining the target cross-over ratio between the detection frame of the recording apparatus including the target form and the detection frame including the target portion in the initial video frame is described in theabove step 203, and will not be described herein.
And 507, acquiring an initial matrix according to the t-frame video picture based on the fact that the target intersection ratio is larger than a reference threshold.
In one possible implementation, t is greater than zero and less than the total number of video pictures that the video includes. the t-frame video picture is a continuous t-frame video picture starting with the initial video picture. The process of obtaining the initial matrix is described instep 204 above, and will not be described in detail here.
Step 508, obtaining the sub-matrix of the t+1st frame video picture.
In a possible implementation manner, the process of obtaining the sub-matrix of the t+1st frame of video frame is described in theabove step 204, and will not be described herein.
Step 509, according to the sub-matrix and the initial matrix of the t+1st frame video frame, obtaining the first occurrence number of the shooting device in the target form from the initial video frame to the t+1st frame video frame.
In a possible implementation manner, the process of obtaining the first occurrence number of the recording device from the initial video frame to the t+1st frame of video frame in theabove step 204 is described, and will not be described herein.
Step 510, updating the first occurrence number to obtain the target occurrence number of the video recording device in the target form.
In a possible implementation manner, the process of determining the number of times of occurrence of the target in the video by the recording device in the target form is similar to the process ofstep 205, which is not described herein.
Step 511, determining whether the content of the video includes a recording device according to the number of occurrences of the target.
In a possible implementation manner, the process of determining whether the content of the video includes the recording device according to the number of occurrence of the target is described in theabove step 205, which is not described herein.
Fig. 6 is a schematic structural diagram of a recording detection apparatus according to an embodiment of the present application, where, as shown in fig. 6, the apparatus includes:
anacquisition module 601, configured to acquire a continuous multi-frame video frame included in a video to be subjected to recording detection;
the determiningmodule 602 is configured to determine an initial video frame from multiple frame video frames, where the initial video frame is a video frame of a recording device in which a target portion and a target form appear for the first time;
the determiningmodule 602 is further configured to determine a target cross-over ratio between a detection frame of the recording device including the target form and a detection frame including the target portion in the initial video frame;
aprocessing module 603, configured to process video frames of the multi-frame video frames except for the initial video frame based on the target overlap ratio being greater than the reference threshold value, to obtain the number of times of occurrence of the target in the video by the recording device in the target form;
the determiningmodule 602 is further configured to determine whether the content of the video includes a recording behavior according to the number of occurrences of the target.
In a possible implementation manner, the processing module 603 is configured to obtain an initial matrix according to a target number of frame video frames in multiple frames of video frames, where the initial matrix includes a sub-matrix of the target number of frame video frames, and the sub-matrix of any video frame is used to indicate position information of a detection frame of a recording device including a target form in any video frame, where the target number of frame video frames is a continuous target number of frame video frames starting from the initial video frame; acquiring first position information of a detection frame of a camera equipment containing a target form in a first video picture, wherein the first video picture is a video picture which continuously appears after a target number of frames of video pictures in a plurality of frames of video pictures; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the submatrices included in the initial matrix, and acquiring a plurality of second position information; acquiring a first occurrence number of the shooting equipment in the target form from an initial video picture to a first video picture according to the first position information and the plurality of second position information; and determining the target occurrence number of the video shooting equipment in the target form according to the initial matrix and the first occurrence number.
In a possible implementation manner, theprocessing module 603 is configured to determine first cross ratios between the detection frames determined by the first location information and the detection frames determined by the second location information, to obtain a target number of first cross ratios; determining a first number of first cross ratios of the target number of first cross ratios greater than a target threshold; based on the first number being greater than a number threshold, taking the first number as a first number of occurrences of the recording device in the target modality from the initial video frame to the first video frame; and based on the first number not being larger than the number threshold, taking the second number as a first occurrence number of the photographing device of the target form between the initial video picture and the first video picture, wherein the second number is smaller than the first number.
In a possible implementation manner, theprocessing module 603 is configured to update the initial matrix according to the sub-matrix of the first video frame to obtain a target matrix, where the target matrix includes the sub-matrix of the first video frame; acquiring third position information of a detection frame of the photographing equipment containing the target form in a second video picture, wherein the second video picture is a video picture which is behind the first video picture and is adjacent to the first video picture; acquiring position information of a detection frame of the photographing equipment containing the target form in a video picture corresponding to each submatrix according to the plurality of submatrices included in the target matrix to obtain a plurality of fourth position information; updating the first occurrence times according to the third position information and the plurality of fourth position information to obtain second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture; traversing video pictures except for the target number of frames of video pictures, the first video picture and the second video picture in the video according to the updating process to obtain the target occurrence times of the shooting equipment in the target form in the video.
In a possible implementation manner, theprocessing module 603 is configured to delete the submatrices of the initial video frames in the initial matrix to obtain a reference matrix; and acquiring a target matrix according to the reference matrix and the submatrices of the first video picture.
In a possible implementation manner, theprocessing module 603 is configured to determine second cross-ratios between the detection frames determined by the third location information and the detection frames determined by the fourth location information, to obtain a target number of second cross-ratios; determining a second number of second cross ratios of the target number of second cross ratios greater than the target threshold; and updating the first occurrence number according to the second number to obtain a second occurrence number of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, theprocessing module 603 is configured to update the first occurrence number according to the second number to obtain a reference number; acquiring reference position information of a detection frame of a camera device containing a target form in a reference video picture, wherein the reference video picture is a video picture of a target number of video pictures before and at an interval from a second video picture; determining a third intersection ratio between the detection frame determined by the reference position information and the detection frame determined by the third position information; and according to the third cross-over ratio, adjusting the reference times to obtain the second occurrence times of the shooting equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, theprocessing module 603 is configured to add a third value based on the first occurrence number to obtain the reference number based on the second number being greater than the number threshold; and adding a fourth numerical value based on the first occurrence number based on the second number not larger than the number threshold value to obtain the reference number, wherein the fourth numerical value is smaller than the third numerical value.
In a possible implementation manner, theprocessing module 603 is configured to use the reference number as the second occurrence number of the recording device in the target form between the initial video frame and the second video frame based on the third blending ratio being greater than the target threshold; and adding a fifth numerical value based on the reference times based on the third intersection ratio not larger than the target threshold value, and obtaining a second occurrence time of the photographing equipment in the target form between the initial video picture and the second video picture.
In a possible implementation manner, the determiningmodule 602 is configured to obtain, based on that the number of occurrences of the target is greater than the number threshold, and that the third video frame includes a recording device in a target form and a target location, fifth location information of a detection frame of the recording device in the target form and sixth location information of a detection frame of the target location in the third video frame, where the third video frame is a last video frame detected in the multi-frame video frame; determining a fourth merging ratio of the target form of the video recording device and the target part in the third video picture according to the fifth position information and the sixth position information; and determining that the content of the video includes recording behavior based on the fourth intersection ratio being greater than the intersection ratio threshold.
In one possible implementation, the apparatus further includes:
and the adding module is used for adding a target mark at the position indicated by the fifth position information of the third video picture, wherein the target mark is used for indicating a recording device with a target form at the position indicated by the fifth position information.
The device determines an initial video picture from continuous multi-frame video pictures included in the video, tracks the multi-frame video pictures included in the video when the target merging ratio between a detection frame of the video equipment including the target form and a detection frame including the target position included in the initial video picture is larger than a reference threshold value, so as to obtain the number of target occurrence times of the video equipment including the target form in the video, and determines whether the content of the video includes the shooting behavior according to the number of target occurrence times. The method does not need manual participation, saves time required by the video recording detection, improves the video recording detection efficiency, and improves the video recording detection accuracy because whether the video content comprises the video recording behavior is determined by tracking the whole video, and whether the video content comprises the video recording behavior is not determined based on a certain frame of video picture in the video.
It should be understood that, in implementing the functions of the apparatus provided above, only the division of the above functional modules is illustrated, and in practical application, the above functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
Fig. 7 is a schematic structural diagram of a server according to an embodiment of the present application, where theserver 700 may have a relatively large difference due to different configurations or performances, and may include one or more processors (Central Processing Units, CPU) 701 and one ormore memories 702, where at least one program code is stored in the one ormore memories 702, and the at least one program code is loaded and executed by the one ormore processors 701 to implement the recording detection method provided in each method embodiment. Of course, theserver 700 may also have a wired or wireless network interface, a keyboard, an input/output interface, and other components for implementing the functions of the device, which are not described herein.
Fig. 8 shows a block diagram of aterminal device 800 according to an exemplary embodiment of the present application. Theterminal device 800 may be a portable mobile terminal such as: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, motion picture expert compression standard audio plane 3), an MP4 (Moving Picture Experts Group Audio Layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer.Terminal device 800 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like.
In general, theterminal device 800 includes: a processor 801 and amemory 802.
Processor 801 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 801 may be implemented in at least one hardware form of DSP (Digital Signal Processing ), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array ). The processor 801 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 801 may integrate a GPU (Graphics Processing Unit, image processor) for taking care of rendering and rendering of the content that the display screen is required to display. In some embodiments, the processor 801 may also include an AI (Artificial Intelligence ) processor for processing computing operations related to machine learning.
Memory 802 may include one or more computer-readable storage media, which may be non-transitory.Memory 802 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium inmemory 802 is used to store at least one instruction for execution by processor 801 to implement the recording detection methods provided by the method embodiments herein.
In some embodiments, theterminal device 800 may further optionally include: aperipheral interface 803, and at least one peripheral. The processor 801, thememory 802, and theperipheral interface 803 may be connected by a bus or signal line. Individual peripheral devices may be connected to theperipheral device interface 803 by buses, signal lines, or a circuit board. Specifically, the peripheral device includes: at least one ofradio frequency circuitry 804, adisplay 805, acamera assembly 806,audio circuitry 807, apositioning assembly 808, and apower supply 809.
Peripheral interface 803 may be used to connect at least one Input/Output (I/O) related peripheral to processor 801 andmemory 802. In some embodiments, processor 801,memory 802, andperipheral interface 803 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 801, thememory 802, and theperipheral interface 803 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.
TheRadio Frequency circuit 804 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. Theradio frequency circuit 804 communicates with a communication network and other communication devices via electromagnetic signals. Theradio frequency circuit 804 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, theradio frequency circuit 804 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. Theradio frequency circuitry 804 may communicate with other terminal devices via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, theradio frequency circuitry 804 may also include NFC (Near Field Communication ) related circuitry, which is not limited in this application.
Thedisplay 805 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When thedisplay 805 is a touch display, thedisplay 805 also has the ability to collect touch signals at or above the surface of thedisplay 805. The touch signal may be input as a control signal to the processor 801 for processing. At this time, thedisplay 805 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards. In some embodiments, thedisplay 805 may be one, and disposed on a front panel of theterminal device 800; in other embodiments, thedisplay 805 may be at least two, and disposed on different surfaces of theterminal device 800 or in a folded design; in other embodiments, thedisplay 805 may be a flexible display disposed on a curved surface or a folded surface of theterminal device 800. Even more, thedisplay 805 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. Thedisplay 805 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode) or other materials.
Thecamera assembly 806 is used to capture images or video. Optionally, thecamera assembly 806 includes a front camera and a rear camera. Typically, a front camera is provided at the front panel of theterminal device 800, and a rear camera is provided at the rear of theterminal device 800. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, thecamera assembly 806 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
Audio circuitry 807 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, inputting the electric signals to the processor 801 for processing, or inputting the electric signals to theradio frequency circuit 804 for voice communication. For stereo acquisition or noise reduction purposes, a plurality of microphones may be respectively disposed at different portions of theterminal device 800. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 801 or theradio frequency circuit 804 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments,audio circuit 807 may also include a headphone jack.
Thelocation component 808 is used to locate the current geographic location of theterminal device 800 to enable navigation or LBS (Location Based Service, location-based services). Thepositioning component 808 may be a positioning component based on the United states GPS (Global Positioning System ), the Beidou system of China, the Granati system of Russia, or the Galileo system of the European Union.
Thepower supply 809 is used to power the various components in theterminal device 800. Thepower supply 809 may be an alternating current, direct current, disposable battery, or rechargeable battery. When thepower supply 809 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
In some embodiments, theterminal device 800 also includes one or more sensors 810. The one or more sensors 810 include, but are not limited to: acceleration sensor 811, gyroscope sensor 812, pressure sensor 813, fingerprint sensor 814, optical sensor 815, and proximity sensor 816.
The acceleration sensor 811 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established with theterminal apparatus 800. For example, the acceleration sensor 811 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 801 may control thedisplay screen 805 to display a user interface in a landscape view or a portrait view based on the gravitational acceleration signal acquired by the acceleration sensor 811. Acceleration sensor 811 may also be used for the acquisition of motion data of a game or user.
The gyro sensor 812 may detect a body direction and a rotation angle of theterminal device 800, and the gyro sensor 812 may collect a 3D motion of the user to theterminal device 800 in cooperation with the acceleration sensor 811. The processor 801 may implement the following functions based on the data collected by the gyro sensor 812: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.
The pressure sensor 813 may be disposed at a side frame of theterminal device 800 and/or at a lower layer of thedisplay 805. When the pressure sensor 813 is provided at a side frame of theterminal device 800, a grip signal of theterminal device 800 by a user can be detected, and the processor 801 performs left-right hand recognition or quick operation according to the grip signal acquired by the pressure sensor 813. When the pressure sensor 813 is disposed at the lower layer of thedisplay screen 805, the processor 801 controls the operability control on the UI interface according to the pressure operation of the user on thedisplay screen 805. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.
The fingerprint sensor 814 is used to collect a fingerprint of a user, and the processor 801 identifies the identity of the user based on the fingerprint collected by the fingerprint sensor 814, or the fingerprint sensor 814 identifies the identity of the user based on the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 801 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 814 may be provided at the front, rear, or side of theterminal device 800. When a physical key or vendor Logo is provided on theterminal device 800, the fingerprint sensor 814 may be integrated with the physical key or vendor Logo.
The optical sensor 815 is used to collect the ambient light intensity. In one embodiment, the processor 801 may control the display brightness of thedisplay screen 805 based on the intensity of ambient light collected by the optical sensor 815. Specifically, when the intensity of the ambient light is high, the display brightness of thedisplay screen 805 is turned up; when the ambient light intensity is low, the display brightness of thedisplay screen 805 is turned down. In another embodiment, the processor 801 may also dynamically adjust the shooting parameters of thecamera module 806 based on the ambient light intensity collected by the optical sensor 815.
A proximity sensor 816, also called a distance sensor, is typically provided at the front panel of theterminal device 800. The proximity sensor 816 is used to collect the distance between the user and the front face of theterminal device 800. In one embodiment, when the proximity sensor 816 detects that the distance between the user and the front face of theterminal device 800 gradually decreases, the processor 801 controls thedisplay 805 to switch from the bright screen state to the off screen state; when the proximity sensor 816 detects that the distance between the user and the front surface of theterminal device 800 gradually increases, the processor 801 controls thedisplay 805 to switch from the off-screen state to the on-screen state.
It will be appreciated by those skilled in the art that the structure shown in fig. 8 is not limiting and that more or fewer components than shown may be included or certain components may be combined or a different arrangement of components may be employed.
In an exemplary embodiment, there is also provided a computer-readable storage medium having stored therein at least one program code loaded and executed by a processor to cause a computer to implement any of the above-described recording detection methods.
Alternatively, the above-mentioned computer readable storage medium may be a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a Read-Only optical disk (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program or a computer program product having at least one computer instruction stored therein, the at least one computer instruction being loaded and executed by a processor to cause a computer to implement any of the above-described methods of recording detection is also provided.
It should be noted that, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, presented data, etc.), and signals referred to in this application are all authorized by the user or are fully authorized by the parties, and the collection, use, and processing of relevant data is required to comply with relevant laws and regulations and standards of relevant countries and regions. For example, the videos referred to in this application are all acquired with sufficient authorization.
It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.
The foregoing description of the exemplary embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to any modification, equivalents, or improvements made within the principles of the present application.