Disclosure of Invention
The application aims to provide a repeated video identification method and a related device, which can realize the identification of repeated videos in a huge number of videos.
The application is realized as follows:
the application provides a repeated video identification method in a first aspect, which includes:
separating a first multimedia data stream of a video file to be identified;
extracting a first multimedia data feature set in the first multimedia data stream, wherein the first multimedia data feature set comprises a plurality of first multimedia data frames;
matching the first multimedia data frame with a second multimedia data frame of a comparison video file to obtain a matching sequence pair set, wherein the matching sequence pair set comprises a plurality of matching sequence pairs;
judging whether the proportion of the matching sequence pair set exceeds a preset threshold value or not;
if the specific gravity exceeds the preset threshold value, determining that the video file to be identified and the comparison video file are repeated;
and if the specific gravity does not exceed the preset threshold, determining that the video file to be identified and the comparison video file are not repeated.
Optionally, the determining whether the proportion occupied by the matching sequence pair set exceeds a preset threshold includes:
comparing the duration of the video file to be identified with the duration of the comparison video file to obtain a target video file, wherein the target video file is one of the video file to be identified and the comparison video file with shorter duration, or the target video file is one of the video file to be identified and the comparison video file with the same duration;
and judging whether the proportion of the duration of the matching sequence pair set occupying the duration of the target video file exceeds a preset threshold value or not.
Optionally, after obtaining the matching sequence pair set, before determining whether the proportion of the matching sequence pair set exceeds a preset threshold, the method further includes:
obtaining a valid matching frame sequence pair from a plurality of matching sequence pairs, wherein the matching sequence pair set comprises the valid matching frame sequence pair.
Optionally, the obtaining of valid matching frame sequence pairs of the matching sequence pairs includes:
respectively extracting a first frame number of the first multimedia data frame and a second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs;
judging whether the first frame numbers and the second frame numbers of adjacent matching frame sequence pairs are in incremental correspondence or not;
and if the first frame number and the second frame number of the adjacent matching frame sequence pair are in incremental correspondence, determining that the first multimedia data frame and the second multimedia data frame are effective matching frame sequence pairs.
Optionally, after extracting a first frame number of the first multimedia data frame and a second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs, respectively, the method further includes:
extracting the label of the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs as a coordinate value of a first coordinate axis;
extracting the label of the second frame number of the second multimedia data frame in each pair of the matched frame sequence pairs as a coordinate value of a second coordinate axis;
and establishing a target coordinate system by using the first coordinate axis and the second coordinate axis.
Optionally, after establishing the target coordinate system by using the first coordinate axis and the second coordinate axis, the method further includes:
if the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs is effectively matched with the second frame number of the second multimedia data frame, forming an identifier at a target coordinate point on the target coordinate system, wherein the target coordinate point takes the first frame number and the second frame number as coordinate values.
Optionally, the first multimedia data stream includes: a first video stream and/or a first audio stream;
the first multimedia data frame comprises: a first video frame and/or a first audio frame;
the second multimedia data frame comprises: a second video frame and/or a second audio frame.
Optionally, the first video frame includes a home decoration domain feature.
A second aspect of the present application provides a duplicate video identification apparatus, including:
the device comprises a separation unit, a recognition unit and a recognition unit, wherein the separation unit is used for separating a first multimedia data stream of a video file to be recognized;
an extracting unit, configured to extract a first multimedia data feature set in the first multimedia data stream, where the first multimedia data feature set includes a plurality of first multimedia data frames;
the matching unit is used for matching the first multimedia data frame with a second multimedia data frame of a comparison video file to obtain a matching sequence pair set, and the matching sequence pair set comprises a plurality of matching sequence pairs;
the judging unit is used for judging whether the proportion of the matching sequence pair set exceeds a preset threshold value or not;
the first determining unit is used for determining that the video file to be identified and the comparison video file are repeated if the specific gravity exceeds the preset threshold;
and the second determining unit is used for determining that the video file to be identified and the comparison video file are not repeated if the specific gravity does not exceed the preset threshold.
Optionally, when determining whether the proportion occupied by the matching sequence pair set exceeds a preset threshold, the determining unit is specifically configured to:
comparing the duration of the video file to be identified with the duration of the comparison video file to obtain a target video file, wherein the target video file is one of the video file to be identified and the comparison video file with shorter duration, or the target video file is one of the video file to be identified and the comparison video file with the same duration;
and judging whether the proportion of the duration of the matching sequence pair set occupying the duration of the target video file exceeds a preset threshold value or not.
Optionally, the apparatus further comprises:
an obtaining unit, configured to obtain an effective matching frame sequence pair from among a plurality of matching frame sequence pairs, where the matching frame sequence pair set includes the effective matching frame sequence pair.
Optionally, when obtaining valid matching frame sequence pairs in a plurality of matching sequence pairs, the obtaining unit is specifically configured to:
respectively extracting a first frame number of the first multimedia data frame and a second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs;
judging whether the first frame numbers and the second frame numbers of adjacent matching frame sequence pairs are in incremental correspondence or not;
and if the first frame number and the second frame number of the adjacent matching frame sequence pair are in incremental correspondence, determining that the first multimedia data frame and the second multimedia data frame are effective matching frame sequence pairs.
Optionally, the apparatus further comprises:
the extracting unit is further used for extracting the label of the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs as a coordinate value of a first coordinate axis;
the extracting unit is further used for extracting the label of the second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs as a coordinate value of a second coordinate axis;
and the establishing unit is used for establishing a target coordinate system by using the first coordinate axis and the second coordinate axis.
Optionally, the apparatus further comprises:
a forming unit, configured to form an identifier at a target coordinate point on the target coordinate system if the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs is effectively matched with the second frame number of the second multimedia data frame, where the target coordinate point uses the first frame number and the second frame number as coordinate values.
Optionally, the first multimedia data stream includes: a first video stream and/or a first audio stream;
the first multimedia data frame comprises: a first video frame and/or a first audio frame;
the second multimedia data frame comprises: a second video frame and/or a second audio frame.
Optionally, the first video frame includes a home decoration domain feature.
A third aspect of the present application provides a computer device comprising:
the system comprises a processor, a memory, a bus, an input/output interface and a wireless network interface;
the processor is connected with the memory, the input/output interface and the wireless network interface through a bus;
the memory stores a program;
the processor, when executing the program stored in the memory, implements the duplicate video identification method of any of the preceding first aspects.
A fourth aspect of the present application provides a computer-readable storage medium having stored therein instructions which, when executed on a computer, cause the computer to perform the duplicate video identification method of any one of the preceding first aspects.
A fifth aspect of the present application provides a computer program product which, when executed on a computer, causes the computer to perform the duplicate video identification method of any one of the preceding first aspects.
According to the technical scheme, the embodiment of the application has the following advantages:
the repeated video identification method extracts a first multimedia data feature set in a first multimedia data stream by separating the first multimedia data stream of a video file to be identified, wherein the first multimedia data feature set comprises a plurality of first multimedia data frames, and the first multimedia data feature set is a video feature set of the video file to be identified; then matching the first multimedia data frame with a second multimedia data frame of the comparison video file to obtain a matching sequence pair set, wherein the matching sequence pair set comprises a plurality of matching sequence pairs, and the matching sequence pair set is a repeated part of the video file to be identified and the comparison video file; judging whether the proportion of the matching sequence pair set exceeds a preset threshold, and if so, determining that the video file to be identified and the comparison video file are repeated; if the specific gravity does not exceed the preset threshold, it is determined that the video file to be identified and the comparison video file are not repeated, and the preset threshold is used as a judgment standard for judging whether the video file to be identified and the comparison video file are repeated or not, so that the size of the preset threshold is adjusted according to actual needs to adapt to requirements. Therefore, the repeated video identification method can identify the repeated videos in a large number of videos, so that the user experience is optimized on one hand, and the video searching efficiency is improved on the other hand.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or intervening elements may also be present.
It should be noted that the terms of orientation such as left, right, up, down, etc. in the present embodiment are only relative concepts or reference to the normal use state of the product, and should not be considered as limiting.
The repeated video identification method can be deployed in any system needing repeated video identification, for example, the system is deployed in a server of a large video website and is used for performing repeated video identification on a large number of videos managed by the server, and further, a clear comprehensive comparison conclusion can be obtained for a manager to make a decision.
Referring to fig. 1, an embodiment of a repetitive video recognition method according to the present application includes:
101. and separating the first multimedia data stream of the video file to be identified.
The method comprises the steps of firstly separating a first multimedia data stream of a video file to be identified, wherein the first multimedia data stream comprises multimedia data of the video file to be identified, such as a video stream, an audio stream and the like. The method separates the first multimedia data in the video file to be identified, and is beneficial to the comparison and analysis of each type of multimedia data in the subsequent steps.
102. A first multimedia data feature set in the first multimedia data stream is extracted, wherein the first multimedia data feature set comprises a plurality of first multimedia data frames.
After the first multimedia data stream of the video file to be identified is separated instep 101, this step may extract a first multimedia data feature set in the first multimedia data stream, where the first multimedia data feature set includes a number of first multimedia data frames. For example, the first multimedia data frames are furniture (sofa, television cabinet) data frames reflecting the characteristics of the home decoration field, or building material (tile, ceiling) data frames reflecting the characteristics of the home decoration field, and the like, and the set of the first multimedia data frames is the first multimedia data characteristic set.
103. And matching the first multimedia data frame with a second multimedia data frame of the comparison video file to obtain a matching sequence pair set, wherein the matching sequence pair set comprises a plurality of matching sequence pairs.
In this step, the first multimedia data frame instep 102 is matched with the second multimedia data frame of the comparison video file, so as to determine how many frames of repeated multimedia data frames are between the first multimedia data frame representing the video file to be identified and the second multimedia data frame representing the comparison video file.
104. Judging whether the proportion of the matching sequence pair set exceeds a preset threshold, and executing thestep 105 if the proportion of the matching sequence pair set exceeds the preset threshold; if the specific gravity of the matching sequence pair set does not exceed the preset threshold,step 106 is executed.
For example, this step further determines whether the proportion of the duration corresponding to the matching sequence pair set instep 103 occupying the duration of the entire video file to be identified (or the comparison video file) exceeds a preset threshold, and if the proportion of the matching sequence pair set exceeds the preset threshold, it indicates that there are many first multimedia data frames in the video file to be identified that are the same as the second multimedia data frames in the comparison video file; if the proportion of the matching sequence pair set does not exceed the preset threshold, the fact that fewer (or nonexistent) first multimedia data frames exist in the video file to be identified is the same as second multimedia data frames of the comparison video file is indicated.
105. And determining that the video file to be identified is repeated with the comparison video file.
When it is determined instep 104 that the proportion of the matching sequence pair set exceeds the preset threshold, which indicates that there are more first multimedia data frames in the video file to be identified and the second multimedia data frames of the comparison video file are the same, this step determines that the video file to be identified and the comparison video file are repeated.
106. And determining that the video file to be identified and the comparison video file are not repeated.
When it is determined instep 104 that the proportion of the matching sequence pair set does not exceed the preset threshold, indicating that there are fewer (or none) first multimedia data frames in the video file to be identified and second multimedia data frames in the comparison video file are the same, this step determines that the video file to be identified and the comparison video file are not repeated.
Therefore, when the repeated video identification method is deployed in any system needing repeated video identification, for example, the system is deployed in a server of a large-scale video website, and is used for performing repeated video identification on a large number of videos managed by the server, so that a clear comprehensive comparison conclusion can be obtained for a manager to make a decision, the repeated videos can be accurately identified in the large number of videos, user experience is optimized, and video search efficiency is improved.
Referring to fig. 2, an embodiment of a repetitive video recognition method according to the present application includes:
201. and separating the first multimedia data stream of the video file to be identified.
The execution of this step is similar to step 101 in the embodiment of fig. 1, and the repeated parts are not described again here.
It should be noted that, many mature technologies exist in the prior art for separating audio and video in a video file, where the audio correspondence can be separately saved as an audio stream, and the video can be separately saved as a video stream, where the audio stream and the video stream both belong to one of multimedia data streams, and the first multimedia data stream of this embodiment may be an audio stream and/or a video stream, that is, the first multimedia data stream includes: a first video stream and/or a first audio stream.
202. A first multimedia data feature set in the first multimedia data stream is extracted, wherein the first multimedia data feature set comprises a plurality of first multimedia data frames.
The execution of this step is similar to thestep 102 in the embodiment of fig. 1, and repeated descriptions are omitted here.
It is worth noting that the extraction precision of the first multimedia data feature set directly affects the result of video repetition detection, and the step can be based on a local feature method — a scale space-based local feature point extraction and matching algorithm (SIFT) proposed by Lowe, which can show strong robustness under the influence of noise with great intensity and the change of brightness, visual angle and rotation, and compared with other currently popular feature point extraction algorithms, the feature point extracted by the SIFT algorithm has the best stability, and each SIFT feature point is described as a 128-dimensional feature vector. For a video file to be identified, the video file to be identified may be considered to be composed of n multimedia data frames, each multimedia data frame is subjected to feature point calculation to obtain a first multimedia data feature set in a first multimedia data stream, the first multimedia data feature set includes a plurality of first multimedia data frames, and the first multimedia data frames include: a first video frame and/or a first audio frame.
203. And matching the first multimedia data frame with a second multimedia data frame of the comparison video file to obtain a matching sequence pair set, wherein the matching sequence pair set comprises a plurality of matching sequence pairs.
The execution of this step is similar to step 103 in the embodiment of fig. 1, and the repeated parts are not described herein again.
For example, let the first multimedia data frame include f11, f12, f13, f14 … …; setting the second multimedia data frame to comprise f21, f22, f23 and f24 … …; one of the matching sequences was designated as F1(F11, F21), the other as F2(F12, F22), and the other as F3(F13, F23) … …; wherein F1, F2, F3, etc. respectively represent a matching sequence pair with sequence number 1, a matching sequence pair with sequence number 2, a matching sequence pair with sequence number 3, … …; and (F11, F21) in the F1(F11, F21) indicates that the first multimedia data frame with the frame number F11 in the video file to be recognized can be matched and repeated when the second multimedia data frame with the frame number F21 in the video file is compared. The second multimedia data frame includes: a second video frame and/or a second audio frame.
204. And obtaining effective matching frame sequence pairs in the plurality of matching sequence pairs, wherein the matching sequence pair set comprises the effective matching frame sequence pairs.
It is understood that the definition of the repeated video in this step is: the two video files not only have the matching sequence pair set duration exceeding the preset threshold, but also have the same playing sequence of the matching sequence pairs in the matching sequence pair set, that is, the forward playing and the reverse playing of the same video file can be considered as two different video files by the scheme. In view of this, this step needs to eliminate invalid matching frame sequence pairs of several matching sequence pairs in the matching sequence pair set, and only keep valid matching frame sequence pairs in the matching sequence pair set. The so-called invalid matching frame sequence pair is: for matching sequence pairs that may interfere with the process of defining the repeated video, for example, in matching sequence pairs of F1(F11, F21), F2(F12, F22) and F3(F13, F23) … …, if the matching sequence pair F2 is matched to F2(F12, F29), it is known that the first multimedia data frame with the frame number of F12 in the video file to be recognized matches the second multimedia data frame with the frame number of F29 in the comparison video file, however, the first multimedia data frame with the frame number of F12 in the matching sequence pair F2(F12, F29) and the second multimedia data frame with the frame number of F29 in the matching sequence pair set are determined to be non-repeated, it can be determined that this frame of multimedia data is edited, at least the playing sequence between the two video files is moved, and we can determine this matched multimedia data, but the process of matching the modified multimedia data frames in the playing time sequence becomes outlier matching, and the matching sequence of the outlier matching is not considered to be a repeated video frame; the multimedia data frame matching process which can match the multimedia data frames and is not modified in the playing time sequence becomes an effective matching.
Specifically, this step extracts the first frame number of the first multimedia data frame and the second frame number of the second multimedia data frame in each pair of matching frame sequence pairs, for example, extracts F11 and F21 in F1(F11, F21), extracts F12 and F29 in F2(F12, F29) (or F12 and F22 in F2(F12, F22) as another example), extracts F13 and F23, … … in F3(F13, F23) respectively; whether incremental correspondence exists between the first frame numbers and the second frame numbers of the adjacent matching frame sequence pairs or not is judged, for example, whether incremental correspondence exists between f11, f12 and f13 … … is judged, and it is obvious that whether incremental correspondence exists between the corresponding f21, f22 and f23 … … or not is also obvious, so that the first multimedia data frame and the second multimedia data frame can be considered to be a valid matching frame sequence pair at this time; however, if it is determined whether F11, F12, and F13 … … are in incremental correspondence, it is obvious that whether corresponding F21, F29, and F23 … … are in incremental correspondence is also determined, and it is obvious that the corresponding F21, F29, and F23 … … are not in incremental correspondence, then the matching sequence pair F2(F12, F29) is an invalid matching frame sequence pair and is non-duplicate, and the matching sequence pair F2(F12, F29) should be eliminated in this step.
205. And establishing a target coordinate system, and forming an identifier at a target coordinate point on the target coordinate system.
In order to make the expression instep 204 more specific, this step may correspondingly establish a target coordinate system for all matching sequence pairs in the matching sequence pair set according to a certain rule. Specifically, the index of the first frame number of the first multimedia data frame in each pair of matching frame sequence pairs is extracted as the coordinate value of the first coordinate axis, for example, the index of F11 in F1(F11, F21) is extracted as 1 (because "F1" of F11 represents the video file to be recognized), and 1 is taken as the coordinate value of the first coordinate axis; extracting the second frame number of the second multimedia data frame in each pair of matching frame sequence pairs as the coordinate value of the second coordinate axis, for example, the index of F21 in F1(F11, F21) is also 1 (because "F2" of F21 represents contrast video file), and 1 is the coordinate value of the second coordinate axis; and establishing a target coordinate system by using the first coordinate and the second coordinate, wherein each pair of matched frame sequence is a coordinate point on the target coordinate system.
Further, if the first frame number of the first multimedia data frame in each pair of matching frame sequence pairs is effectively matched with the second frame number of the second multimedia data frame, this step forms an identifier at the target coordinate point on the target coordinate system (please refer to fig. 6, where the matching frame sequence pairs effectively matched are identified at the target coordinate point by "+"), and the target coordinate point takes the first frame number and the second frame number as coordinate values; if the first frame number of the first multimedia data frame in each pair of matched frame sequence pairs is outlierly matched with the second frame number of the second multimedia data frame, this step forms an identifier at the target coordinate point on the target coordinate system (please refer to fig. 6, where the outlierly matched pair of matched frame sequences in fig. 6 is identified at its target coordinate point by "x"). If the matching frame sequence pair is a valid match, the coordinate points on the target coordinate system can be fitted to a monotonically increasing straight line, as shown in fig. 6; if the matching frame sequence pair is an outlier match, the coordinate points on the target coordinate system are a set of randomly scattered points.
206. Judging whether the proportion of the matching sequence pair set exceeds a preset threshold, and if so, executingstep 207; if the specific gravity of the matching sequence pair set does not exceed the preset threshold,step 208 is executed.
Specifically, the time length of the video file to be identified and the time length of the video file to be compared can be compared to obtain a target video file, wherein the target video file is one of the video file to be identified and the video file to be compared, which has a shorter time length; or, the target video file is one of the video files to be identified and the comparison video file when the time length is the same; then, the step further judges whether the proportion of the duration of the matching sequence pair set occupying the duration of the target video file exceeds the preset threshold value, namely, the similarity between the matching sequence pair set and the multimedia data frame of the target video file is calculated, if the durations of the two video files are different, the similarity is calculated on the basis of the video file with shorter duration, and if the proportion of the matching sequence pair set exceeding the preset threshold value, the situation that more first multimedia data frames exist in the video file to be identified and the second multimedia data frames of the comparison video file are the same is indicated; if the proportion of the matching sequence pair set does not exceed the preset threshold, the fact that fewer (or nonexistent) first multimedia data frames exist in the video file to be identified is the same as second multimedia data frames of the comparison video file is indicated.
207. And determining that the video file to be identified is repeated with the comparison video file.
When it is determined instep 206 that the proportion of the matching sequence pair set exceeds the preset threshold, which indicates that there are more first multimedia data frames in the video file to be identified and the second multimedia data frames in the comparison video file are the same, this step determines that the video file to be identified and the comparison video file are repeated.
208. And determining that the video file to be identified and the comparison video file are not repeated.
When it is determined instep 206 that the proportion of the matching sequence pair set does not exceed the preset threshold, indicating that there are fewer (or none) first multimedia data frames in the video file to be identified and second multimedia data frames in the comparison video file are the same, this step determines that the video file to be identified and the comparison video file are not repeated.
Therefore, the repeated video identification method has the advantages in time efficiency, more time may be consumed in feature extraction due to the adoption of local feature description, but in the feature matching process, the LSH hash statistics method can be adopted, and the matching process can be completed within the linear time complexity O (m + n). The actual measurement shows that the time consumption of the detection algorithm is about 1/5 of the original video length, and the application requirements can be met.
The above embodiment describes the duplicate video identification method of the present application, and the following describes the duplicate video identification device of the present application, please refer to fig. 3, an embodiment of the duplicate video identification device includes:
aseparation unit 301, configured to separate a first multimedia data stream of a video file to be identified;
an extractingunit 302, configured to extract a first multimedia data feature set in the first multimedia data stream, where the first multimedia data feature set includes a number of first multimedia data frames;
amatching unit 303, configured to match the first multimedia data frame with a second multimedia data frame of a comparison video file to obtain a matching sequence pair set, where the matching sequence pair set includes a plurality of matching sequence pairs;
a determiningunit 304, configured to determine whether a proportion occupied by the matching sequence pair set exceeds a preset threshold;
a first determiningunit 305, configured to determine that the video file to be identified and the comparison video file are repeated if the specific gravity exceeds the preset threshold;
a second determiningunit 306, configured to determine that the video file to be identified and the comparison video file are not repeated if the specific gravity does not exceed the preset threshold.
The operation performed by the video recognition apparatus according to the embodiment of the present application is similar to that performed in the embodiment of fig. 1, and is not repeated herein.
Therefore, when the repeated video identification method is deployed in any system needing repeated video identification, for example, the system is deployed in a server of a large-scale video website, the repeated video identification method is used for performing repeated video identification on a large number of videos managed by the server, and further, a clear comprehensive comparison conclusion can be obtained for a manager to make a decision, so that the repeated videos can be accurately identified in the large number of videos.
Referring to fig. 4, another embodiment of a duplicate video recognition apparatus includes:
aseparating unit 401, configured to separate a first multimedia data stream of a video file to be identified;
an extractingunit 402, configured to extract a first multimedia data feature set in the first multimedia data stream, where the first multimedia data feature set includes a number of first multimedia data frames;
amatching unit 403, configured to match the first multimedia data frame with a second multimedia data frame of a comparison video file to obtain a matching sequence pair set, where the matching sequence pair set includes a plurality of matching sequence pairs;
a determiningunit 404, configured to determine whether a proportion occupied by the matching sequence pair set exceeds a preset threshold;
a first determiningunit 405, configured to determine that the video file to be identified and the comparison video file are repeated if the specific gravity exceeds the preset threshold;
a second determiningunit 406, configured to determine that the video file to be identified and the comparison video file are not repeated if the specific gravity does not exceed the preset threshold.
Optionally, when determining whether the proportion occupied by the matching sequence pair set exceeds a preset threshold, the determiningunit 404 is specifically configured to:
comparing the duration of the video file to be identified with the duration of the comparison video file to obtain a target video file, wherein the target video file is one of the video file to be identified and the comparison video file with shorter duration, or the target video file is one of the video file to be identified and the comparison video file with the same duration;
and judging whether the proportion of the duration of the matching sequence pair set occupying the duration of the target video file exceeds a preset threshold value or not.
Optionally, the apparatus further comprises:
an obtainingunit 407, configured to obtain a valid matching frame sequence pair from a plurality of matching frame sequence pairs, where the set of matching frame pairs includes the valid matching frame sequence pair.
Optionally, the obtainingunit 407, when obtaining valid matching frame sequence pairs in a plurality of matching sequence pairs, is specifically configured to:
respectively extracting a first frame number of the first multimedia data frame and a second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs;
judging whether the first frame numbers and the second frame numbers of adjacent matching frame sequence pairs are in incremental correspondence or not;
and if the first frame number and the second frame number of the adjacent matching frame sequence pair are in incremental correspondence, determining that the first multimedia data frame and the second multimedia data frame are effective matching frame sequence pairs.
Optionally, the apparatus further comprises:
an extractingunit 402, further configured to extract an index of the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs as a coordinate value of a first coordinate axis;
an extractingunit 402, further configured to extract a label of the second frame number of the second multimedia data frame in each pair of the matching frame sequence pairs as a coordinate value of a second coordinate axis;
an establishingunit 408 for establishing a target coordinate system using the first coordinate axis and the second coordinate axis.
Optionally, the apparatus further comprises:
a formingunit 409, configured to form an identifier at a target coordinate point on the target coordinate system if the first frame number of the first multimedia data frame in each pair of the matching frame sequence pairs is effectively matched with the second frame number of the second multimedia data frame, where the target coordinate point uses the first frame number and the second frame number as coordinate values.
Optionally, the first multimedia data stream includes: a first video stream and/or a first audio stream;
the first multimedia data frame comprises: a first video frame and/or a first audio frame;
the second multimedia data frame comprises: a second video frame and/or a second audio frame.
Optionally, the first video frame includes a home decoration domain feature.
The operation performed by the video recognition apparatus according to the embodiment of the present application is similar to that performed in the embodiment of fig. 2, and is not repeated herein.
Referring to fig. 5, a computer device in an embodiment of the present application is described below, where an embodiment of the computer device in the embodiment of the present application includes:
thecomputer device 500 may include one or more processors (CPUs) 501 and amemory 502, where thememory 502 stores one or more applications or data. Wherein thememory 502 is volatile storage or persistent storage. The program stored inmemory 502 may include one or more modules, each of which may include a sequence of instructions operating on a computer device. Still further, theprocessor 501 may be arranged in communication with thememory 502 to execute a series of instruction operations in thememory 502 on thecomputer device 500. Thecomputer device 500 may also include one or more wireless network interfaces 503, one or more input-output interfaces 504, and/or one or more operating systems, such as Windows Server, Mac OS, Unix, Linux, FreeBSD, etc. Theprocessor 501 may perform the operations performed in the embodiments shown in fig. 1 or fig. 2, which are not described herein again.
In the several embodiments provided in the embodiments of the present application, it should be understood by those skilled in the art that the disclosed system, apparatus and method can be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the unit is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.