Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Image compensation (Image interpolation) has been An important research branch of computer vision, and in the related art, the interpolation algorithm can be divided into two categories, one is to calculate the missing region by using a traditional calculation Method and iterate to converge, or compensate the Image in An optimized manner, for example, An Image interpolation Technique Based On the Fast Marching Method proposed by Alexandru Telea, and the other is biased to solve the compensation problem by using machine learning.
For a traditional compensation algorithm, the iteration speed is low, and the compensation effect is relatively general. The prediction filling process of the algorithm based on machine learning is time-consuming, and meanwhile, the effect is general for different scenes.
Therefore, the mature algorithm in the related art is more suitable for processing a static image, and the application difficulty still exists for real-time video processing, and the invention considers the specificity of the problem solved by the invention: namely, the subtitle is eliminated, the area is fixed, meanwhile, the area where the subtitle appears is often not in the very important viewing area, and meanwhile, the characters have more bent areas.
The system related to the embodiment of the invention can be a distributed system formed by connecting a client, a plurality of nodes (any form of computing equipment in an access network, such as a server and a user terminal) through a network communication mode.
Taking a distributed system as an example of a blockchain system, referring To fig. 1, fig. 1 is an optional structural schematic diagram of a blockchain system To which a distributed system 100 provided by an embodiment of the present invention is applied, where the system is formed by a plurality of nodes 200 (computing devices in any form in an access network, such as servers and user terminals) and a client 300, a Peer-To-Peer (P2P, Peer To Peer) network is formed between the nodes, and a P2P protocol is an application layer protocol operating on a Transmission Control Protocol (TCP). In a distributed system, any machine, such as a server or a terminal, can join to become a node, and the node comprises a hardware layer, a middle layer, an operating system layer and an application layer.
Referring to the functions of each node in the blockchain system shown in fig. 1, the functions involved include:
1) routing, a basic function that a node has, is used to support communication between nodes.
Besides the routing function, the node may also have the following functions:
2) the application is used for being deployed in a block chain, realizing specific services according to actual service requirements, recording data related to the realization functions to form recording data, carrying a digital signature in the recording data to represent a source of task data, and sending the recording data to other nodes in the block chain system, so that the other nodes add the recording data to a temporary block when the source and integrity of the recording data are verified successfully.
For example, the services implemented by the application include:
2.1) wallet, for providing the function of transaction of electronic money, including initiating transaction (i.e. sending the transaction record of current transaction to other nodes in the blockchain system, after the other nodes are successfully verified, storing the record data of transaction in the temporary blocks of the blockchain as the response of confirming the transaction is valid; of course, the wallet also supports the querying of the remaining electronic money in the electronic money address;
and 2.2) sharing the account book, wherein the shared account book is used for providing functions of operations such as storage, query and modification of account data, record data of the operations on the account data are sent to other nodes in the block chain system, and after the other nodes verify the validity, the record data are stored in a temporary block as a response for acknowledging that the account data are valid, and confirmation can be sent to the node initiating the operations.
2.3) Intelligent contracts, computerized agreements, which can enforce the terms of a contract, implemented by codes deployed on a shared ledger for execution when certain conditions are met, for completing automated transactions according to actual business requirement codes, such as querying the logistics status of goods purchased by a buyer, transferring the buyer's electronic money to the merchant's address after the buyer signs for the goods; of course, smart contracts are not limited to executing contracts for trading, but may also execute contracts that process received information.
3) And the Block chain comprises a series of blocks (blocks) which are mutually connected according to the generated chronological order, new blocks cannot be removed once being added into the Block chain, and recorded data submitted by nodes in the Block chain system are recorded in the blocks.
Referring to fig. 2, fig. 2 is an optional schematic diagram of a Block Structure (Block Structure) according to an embodiment of the present invention, where each Block includes a hash value of a transaction record stored in the Block (hash value of the Block) and a hash value of a previous Block, and the blocks are connected by the hash values to form a Block chain. The block may include information such as a time stamp at the time of block generation. A block chain (Blockchain), which is essentially a decentralized database, is a string of data blocks associated by using cryptography, and each data block contains related information for verifying the validity (anti-counterfeiting) of the information and generating a next block.
The scheme of the embodiment of the invention can be applied to a mobile terminal player in the related technology, and the applicable use scene can comprise the following steps:
step 1, a user device plays a video with embedded subtitles, wherein the embedded subtitles cannot be directly replaced like an externally-hung subtitle;
step 2, the user wants to change the caption, initiates a request for changing the caption through the user equipment, and selects the caption which is wanted to be watched;
step 3, the server analyzes the movie picture, eliminates the original subtitles, and adds subtitles selected by the user in the video;
and 4, sending the video after replacing the subtitle to the user equipment.
According to an aspect of the embodiments of the present invention, there is provided an image compensation method, which may be applied, but not limited, to the environment shown in fig. 3 as an optional implementation manner.
Optionally, in this embodiment, the image compensation method may be applied, but not limited to, in the server 304, for assisting the application client in processing the published subtitle replacement request. The application client may be but not limited to run in the user equipment 302, and the user equipment 302 may be but not limited to a mobile phone, a tablet computer, a notebook computer, a PC, and other terminal equipment supporting running of the application client. The server 104 and the user device 302 may, but are not limited to, enable data interaction via a network 310, which may include, but is not limited to, a wireless network or a wired network. Wherein, this wireless network includes: bluetooth, WIFI, and other networks that enable wireless communication. Such wired networks may include, but are not limited to: wide area networks, metropolitan area networks, and local area networks. The above is merely an example, and this is not limited in this embodiment. The service 304 and the user terminal 302 may be one block in a block chain.
Optionally, as an optional implementation manner, as shown in fig. 4, the image compensation method includes the following steps:
step S402, acquiring a first frame image including embedded subtitles in a video to be processed and a first area of the embedded subtitles in the first frame image;
before executing step S402, receiving a request for replacing the embedded subtitle, which is initiated by a user through a client application program, may also be included.
The frame image containing the embedded caption can be identified by using a machine learning model for image identification in the related art, and the frame image containing the embedded caption can also be identified by adopting the scheme of the subsequent optional embodiment of the invention.
Step S404, searching a second frame image without embedded subtitles in the video to be processed, wherein the pixel distance between the second frame image and the first frame image meets the target condition;
the pixel distance between the two frame images meeting the target condition indicates that the two frame images are similar, i.e., exhibit similar pictures. The pixel distance between the two frame images may be a difference value by subtracting pixel values of the two frame images, or may be calculated by a mahalanobis distance or a euclidean distance in the related art. The pixel distance meeting the target condition may be less than a second target threshold.
Step S406, under the condition that the second frame image is found, performing image compensation on the pixel point on the first region by using the pixel point on the second region corresponding to the first region in the second frame image, so as to eliminate the embedded subtitle on the first region.
The correspondence relationship between the second area and the first area may be: the relative position of the second area on the second frame image is the same as the position of the first area on the first frame image, for example, if the first area is directly below the first frame image, the second area is also directly below the second frame image. And replacing the area where the embedded caption is located by another image under the mountain foot, which also displays a second frame image of the mountain, when the embedded caption is displayed at the position under the mountain foot of a distant mountain displayed by the first frame image.
By adopting the scheme, the pixel distance between the frame images is calculated, the second frame image without the embedded caption, which is similar to the first frame image with the embedded caption, is obtained, the caption area in the first frame image is covered by the corresponding local area in the second frame image, so that the embedded caption is rapidly and accurately removed in real time.
Alternatively, after the image compensation is performed in step S406, the subtitle selected by the user may be added to the first frame image, so as to form a video after replacing the subtitle for the user to watch.
Optionally, the image compensation of the pixel point on the first region by using the pixel point on the second region corresponding to the first region in the second frame image includes: and replacing the pixel value of the pixel point on the first area with the pixel value of the pixel point on the second area. The pixel points in the second region belong to the pixel points in the second region of the second frame image.
Optionally, the searching for the second frame image without the embedded subtitle in the to-be-processed video, where a pixel distance from the first frame image meets a target condition, includes: searching a target frame image without embedded subtitles from the first frame image forwards or backwards in the video to be processed, wherein the time interval between the target frame image and the first frame image is less than a first target threshold value, and the pixel distance between the pixel point of the target frame image and the pixel point of the first frame image is less than a second target threshold value; and determining the target search frame image as the second frame image when the target frame image without the embedded caption is searched. If the first frame image has a first time in the video to be processed, searching for a frame image before the first time or a frame image after the first time, where the first target threshold of the search range may be a frame image within 1 second. By adopting the scheme, the second frame image similar to the first frame image can be quickly and accurately found.
Optionally, after searching for a second frame image without embedded subtitles in the video to be processed, where a pixel distance between the second frame image and the first frame image meets a target condition, when the second frame image is not found, performing region segmentation on characters in the embedded subtitles in the first frame image to obtain a group of character regions; executing the following steps for each first pixel point in each character area, wherein each first pixel point is regarded as a current pixel point when the following steps are executed: acquiring second pixel values of four second pixel points which are closest to the current pixel point in four directions of up, down, left and right in the first frame of image, and acquiring an average value of first pixel values of all the first pixel points in the character area where the current pixel point is located, wherein the second pixel points are not located in the group of character areas; and determining a target pixel value of the current pixel point according to the second pixel values and the mean value of the four second pixel points, and updating the pixel value of the current pixel point to the target pixel value. The set of character regions in this embodiment is a combination of character regions, and the embedded subtitle in the first frame image may include a plurality of characters, and one or more characters are divided into one character region. The characters can be characters of languages of various countries, such as a Chinese character, an English word and the like, and if the pixel points of the frame image are dense enough, the characters can also be radicals of the Chinese character or an English letter and the like. The second pixel point is not located in the group of character areas, namely is not located in the range of the character areas and belongs to the pixel point in the area without the embedded caption. The average value is an average value of the first pixel values of all the first pixel points in the character region.
Optionally, the performing region segmentation on the characters in the embedded subtitle in the first frame image to obtain a group of character regions includes: acquiring the sum of pixel values of each row of pixel points in the first area; determining pixel points on a target column of which the sum of the pixel values in the first region is smaller than a third target threshold value as pixel points on a partition line between adjacent characters; and determining the group of character areas according to the dividing line and the boundary of the first area, wherein each character area comprises one or more characters. The dividing line may have a certain width, for example, the area between two rows of pixel points is the dividing line. Whether a plurality of characters are included in a group of character regions depends on the pixel resolution of the frame image and the interval between characters, for example, if the characters in the embedded subtitle are displayed in an enlarged manner for an elderly user, the interval between characters is large, and each character region includes only one character. If the pixel resolution of the frame image is high, the frame image can be divided into fine character areas, each character area may only comprise one character or even comprise one radical, the rest radicals belong to another character area, theoretically, the smaller the range of the divided character area is, the better the final image compensation effect is, and the influence of subtitle replacement on the display of the frame image is not easy to see by the audience.
Alternatively, when there are multiple lines of characters in the subtitle region, the character region may be divided into columns and then rows, so as to obtain a more accurate character region. When dividing according to the rows, the sum of the pixel values of the pixel points in each row in the first region can be obtained, and the target row with the sum of the pixel values in the first region smaller than the fifth target threshold value is used as the pixel point on the dividing line between the two adjacent rows of characters.
Optionally, determining a target pixel value of the current pixel point according to the second pixel values of the four second pixel points and the mean value, includes:
calculating the target pixel value P of the current pixel point by the following formula combination:
Gh=abs(Pleft-Pright);
Gv=abs(Pup-Pdown);
wherein, the A isTiIs the mean value of PleftThe second pixel value of the pixel point positioned at the left side of the current pixel point in the four second pixel points, PrightThe pixel which is positioned at the right side of the current pixel point in the four second pixel pointsThe second pixel value of the point, PupThe second pixel value of the pixel point above the current pixel point among the four second pixel points, PdownThe abs () is an absolute value function of the second pixel value of the pixel located below the current pixel among the four second pixels.
Optionally, the obtaining a first frame image of an embedded subtitle in a video to be processed and a first region where the embedded subtitle is located in the first frame image includes: performing the following steps for each frame image in the video to be processed, wherein each frame image is regarded as a current frame image when the following steps are performed:
cutting off a partial area in the current frame image to obtain a first partial frame image, wherein the partial area can be an area where a non-subtitle exists in the frame image, for example, a general subtitle is located right below the frame image, and then the partial area above the frame image can be cut off to reduce the data amount of subsequent image processing.
Performing image binarization on the first local frame image to obtain a second local frame image, wherein pixel points in the second local frame image are black pixel points or white pixel points, and the image binarization can be performed by using the Otsu method in the related art, and the pixel values of the black pixel points are set to be 0 and the white pixel points are set to be 1, or the pixel values of the white pixel points are uniformly set to be 255.
Acquiring the sum of the pixel values of each row of pixel points in the second local frame image and the sum of the pixel values of each row of pixel points in the second local frame image; the sum of the pixel values of each column of pixel points in the frame image is obtained and is equivalent to the projection in the horizontal axis direction in the subsequent embodiment, and the sum of the pixel values of each row of pixel points in the frame image is obtained and is equivalent to the projection in the vertical axis direction in the subsequent embodiment.
Searching an initial column, an end column, an initial row and an end row in the second local frame image, wherein the initial column is a first target column from left to right in the second local frame image, and the difference value between the sum of pixel values of pixel points on the first target column and the sum of pixel values of pixel points on a column which is positioned on the left side of the first target column and is adjacent to the first target column is greater than a fourth target threshold value; the termination column is a first second target column from right to left in the second local frame image, and the difference between the sum of pixel values of pixel points on the second target column and the sum of pixel values of pixel points on a column which is located on the right of the target column and is adjacent to the target column is greater than the fourth target threshold; the starting line is a first target line from top to bottom in the second local frame image, and the difference between the sum of the pixel values of the pixel points on the first target line and the sum of the pixel values of the pixel points on the adjacent line above the first target line is greater than the fourth target threshold; the termination line is a first second target line from bottom to top in the second local frame image, and a difference between a sum of pixel values of pixel points on the second target line and a sum of pixel values of pixel points on a line which is located below and adjacent to the second target line is greater than the fourth target threshold; the fourth target threshold values used for the starting column, the ending column, the starting row and the ending row are the same, but different fourth target threshold values may be used and are within the scope of the present invention.
And under the condition that the starting row, the ending row, the starting line and the ending line are found, determining the current frame image as a first frame image of the embedded caption, and determining a region surrounded by the starting row, the ending row, the starting line and the ending line as a first region of the embedded caption in the first frame image. By adopting the scheme, the area of the embedded caption is accurately and quickly obtained by identifying the initial column, the termination column, the initial row and the termination row. If the start column, the end column, the start row, the end row, etc. satisfying the condition are not detected in one frame image, it may be determined that the embedded subtitle is not included in the frame image.
According to another embodiment of the present invention, there is also provided a subtitle positioning method, as shown in fig. 5, the method including the steps of:
s502, performing image binarization processing on the first frame image to be processed to obtain a processed third image, wherein pixel points in the third image are black pixel points or white pixel points;
before the binarization processing is performed on the first frame image, the first frame image may be clipped first, for example, if the subtitle is generally right below the frame image, the upper area of the first frame image is clipped, and the lower area is reserved, so as to reduce the subsequent data processing amount and ensure the speed of subtitle positioning. The third image here may be equivalent to the second partial frame image in the above-described embodiment.
S504, acquiring the sum of the pixel values of each row of pixel points in the third image and the sum of the pixel values of each row of pixel points in the third image;
s506, detecting whether a target case and a target row exist in the third image, wherein the difference value between the sum of the pixel values of the pixel points on the target row and the sum of the pixel values of the pixel points on the adjacent row is larger than a fourth target threshold value; the difference between the sum of the pixel values of the pixel points on the target line and the sum of the pixel values of the pixel points on the adjacent line is greater than the fourth target threshold;
s508, when the target row and the target column are found in the third image, determining that the first frame image includes embedded subtitles, and determining a region enclosed by the target row and the target column as a first subtitle region of the embedded subtitles.
By adopting the scheme, whether the embedded subtitle exists in the image or not is determined by rapidly identifying whether the target column or the target row exists in the image or not, and the position of the embedded subtitle is rapidly and accurately found, so that the problems of low speed and low accuracy in positioning the subtitle in the image in the related technology are solved, and the efficiency of positioning the subtitle is greatly improved.
Optionally, after the region surrounded by the target column and the target row is determined as the first subtitle region of the embedded subtitle, when the first frame image belongs to one frame image in the video to be processed, for a fourth frame image in the video to be processed except the first frame image, preliminarily determining that the subtitle region of the fourth frame image is a fourth position corresponding to the first subtitle region; searching the target column or the target row at the fourth position, wherein when the target column or the target row is detected, the fourth position is determined to be a subtitle area of the fourth frame image; and when the target column or the target row is detected to be absent, determining that the fourth frame image does not comprise the subtitles. By adopting the scheme, after the caption position of one frame of image in the video to be processed is detected, the caption positions of other frames of images in the video are determined to be at similar positions, so that the data processing amount is reduced, the caption positioning of other frames of images is completed by combining the subsequent verification step, and a large amount of calculation resources consumed by one-frame calculation are avoided.
Optionally, detecting whether the third image has a target case and a target row includes: the target column comprises a starting column and a terminating column, the target row comprises a starting row and a terminating row, the starting column is a first target column from left to right in the third image, and the difference value between the sum of pixel values of pixel points on the first target column and the sum of pixel values of pixel points on a column which is positioned at the left side of the first target column and is adjacent to the first target column is greater than a fourth target threshold value; the termination column is a first second target column from right to left in the third image, and the difference between the sum of pixel values of pixel points on the second target column and the sum of pixel values of pixel points on a column which is located on the right of the target column and is adjacent to the target column is greater than the fourth target threshold; the starting line is a first target line from top to bottom in the third image, and the difference between the sum of the pixel values of the pixel points on the first target line and the sum of the pixel values of the pixel points on the adjacent line above the first target line is greater than the fourth target threshold; the termination line is a first second target line from bottom to top in the third image, and a difference between a sum of pixel values of pixel points on the second target line and a sum of pixel values of pixel points on an adjacent line located below the second target line is greater than the fourth target threshold;
when the target column and the target row are found in the third image, determining that the first frame image includes embedded subtitles, and determining a region enclosed by the target column and the target row as a first subtitle region of the embedded subtitles, including: and under the condition that the starting row, the ending row, the starting row and the ending row are found, determining that the embedded caption is included in the first frame image, and determining a region surrounded by the starting row, the ending row, the starting row and the ending row as a first region of the embedded caption in the first frame image.
According to another embodiment of the present invention, there is also provided a subtitle replacing method, as shown in fig. 6, including the steps of:
step S602, receiving a request signal for replacing subtitles of a video to be processed, wherein the request signal is used for requesting to replace subtitles embedded in the video to be processed with subtitles to be replaced;
the request signal may be initiated by the user account or by the video manager account. For example, if the user account belongs to the elderly, the request signal may be a request to replace the original subtitle with a subtitle of a larger font size. Or the user account belongs to a user who does not know the foreign language, the request signal can be the subtitle of the local language instead of the foreign language subtitle.
Step S604, acquiring a first frame image including embedded subtitles in the video to be processed and a first area where the embedded subtitles are located in the first frame image, and searching a second frame image without the embedded subtitles, wherein the pixel distance between the second frame image and the first frame image in the video to be processed meets a target condition;
step S606, under the condition that the second frame image is found, using the pixel point on the second area corresponding to the first area in the second frame image to perform image compensation on the pixel point on the first area so as to eliminate the embedded caption on the first area;
step S608, displaying the subtitle to be replaced in the first frame image after image compensation.
Optionally, the subtitle to be replaced is displayed at the position of the embedded subtitle script.
By adopting the scheme, the pixel distance between the frame images is calculated, the second frame image without the embedded caption, which is similar to the first frame image with the embedded caption, is obtained, the caption area in the first frame image is covered by the corresponding local area in the second frame image, so that the embedded caption is rapidly and accurately removed in real time, the caption to be replaced is subsequently displayed on the display, the caption replacing speed is greatly improved, the blockage caused by the slow caption replacing speed in the video playing process is avoided, and the technical problem that a large amount of calculation resources are consumed for caption replacing in the related technology is solved.
In order to analyze the caption background more accurately, the invention provides a scanning algorithm based on caption features to accurately position the caption background. The caption has the following characteristics that the position of the caption is relatively fixed; most subtitles adopt colors with higher brightness; the caption is positioned below the frame image; the subtitles may not be included in the frame images in the video.
Based on the above features, in order to reduce the calculation amount, the subtitle positioning can be performed on some frame images in the video to be processed, and the subsequent frame images are performed in a feature matching and positioning manner, and the specific implementation steps are as follows:
the frame image sampling and positioning method comprises the following steps:
for caption positioning, some frame images in the video are selected for accurate positioning, and the caption positions in the rest frame images in the video can be determined according to the property that the positions are relatively unchanged.
Step 1, sampling frame images: generally, a video contains at least 24 frames of images per second, and in order to prevent the repetition of the information of the sampled frame images, the sampled frame images are recorded as:
{I1,I2,..,Isamples}
and 2, performing subtitle positioning preprocessing I, namely cutting the frame image according to the characteristics indicating that the subtitle is positioned below the frame image, reserving a bottom quarter area of the frame image, and recording the cut frame image as:
{I′1,I′2,..,I′samples}
the clipping offset is noted as:
offset
and step 3: in the second caption locating preprocessing, as indicated above, the caption is usually made of a color with higher brightness, such as white, in order to keep the caption separated from the backgroundLaw and morphological operator, expanding the foreground features, hence the invention for I'samplesThe processing method comprises the following steps:
binary () represents a binarization algorithm, such as Otsu's method.
Secondly, performing morphological filtering including expansion and erosion to reduce the holes in the frame image, and connecting the processed image to a foreground region, where the foreground region is generally the region where the subtitles are located, and the background region is a non-subtitle region, and the formulation can be expressed as:
the order used in the above formula is erosion first and then expansion.
And 4, a fast positioning algorithm based on projection scanning:
after the preprocessing in step 2 and step 3, the frame image to be positioned has obvious foreground and background separation, in this step, the frame image is projected in the horizontal axis direction and the vertical axis direction, and the projection values are as follows:
Ihori=[D1,D2,...,Dwidth]
Ivert=[D1,D2,...,Dheight]
Ivertmay be a longitudinal axis direction projection value, IhoriMay be a horizontal axis direction projection value; to reduce the spur in the data, the projected values may be smoothed. The horizontal axis direction projection is to add the pixel values of a column of pixel points with the same horizontal coordinate in the frame image, and the vertical axis direction projection is to add the pixel values of a row of pixel points with the same vertical coordinate in the frame image.
Because the preprocessed frame image is relatively pure, the position where the projection value changes violently is the position where the foreground appears for the first time, and by using the property, the embodiment firstly estimates the maximum change value, and secondly determines the appearance position according to the change value, and the pseudo code is as follows:
after the caption positioning method is used, the caption area in the frame image can be found, after several frames of caption areas are identified, it is assumed that the same area in the remaining frame image also has captions, then the same area is checked, and it is determined that there is no caption in the remaining frame image by detecting whether the projection value in the horizontal axis direction or the projection value in the vertical axis direction is 0 or lower than the fourth target threshold.
After subtitle recognition is carried out, subtitle elimination is carried out based on the context of the video to be processed and the local area in the frame image in the following mode.
Strategy 1, compensation strategy based on subtitle-free frame images:
because the video to be processed also contains frame images without subtitles, such frame images can be used as reference frame images, and the specific implementation manner of the strategy is as follows:
step 1, judging whether the current frame image needs subtitle elimination;
step 2, aiming at the frame image needing subtitle elimination, carrying out forward search and/or backward search on the position of the frame image of the video to be processed to obtain a target frame image, wherein the time interval range of the search is one second before and after, and the current frame image is recorded as E0。
Step 3, judging whether the target frame image has the caption or not by using the frame image sampling and positioning method provided by the embodiment, if so, adding the caption into a caption-free set:
E={E1,...,En}
and 4, matching the frame images in the subtitle-free set, wherein for calculation, the strategy adopts a simple pixel-by-pixel comparison method, namely calculating the pixel distance between two frame images:
dis=diff(E0,E1)
the pixel distance here may be a mahalanobis distance or a euclidean distance in the related art.
And 5: and if the distance between the two adjacent frames of images is smaller than a second target threshold value, the target frame image is determined to be a second frame image, and the current target frame image is determined to be similar to the first frame image to be compensated.
And 6, compensating the subtitle position area of the first frame image by using the area corresponding to the subtitle position of the first frame image in the second frame image so as to eliminate the subtitle in the first frame image.
Strategy 2, filling algorithm based on context in frame image:
if strategy 1 does not find a subtitle-free frame picture or detects a second frame picture similar to the first frame picture, we need to compensate for the subtitle using the context information of the current frame picture.
Because the invention redraws the new subtitle after smearing and repairing the subtitle, the precision requirement of the compensation of the region to be repaired is not excessively high, the invention takes a single character in the subtitle as a repairing unit, calculates the neighborhood gradient of the subtitle, and compensates and estimates the smearing compensation region.
Step 1, performing character segmentation on a first area where the positioned caption is located, wherein the position information of each character can be represented in the following mode: coordinate information of the upper left corner of the region where the character is located, and length and width information of the character region. Since there is a space between characters, after binarizing an area and projecting in the horizontal axis direction, a position where the projected value is 0 is defined as a division position between characters.
Step 2: calculating a character TiPixel mean in a region
And step 3: and (5) image compensation. For character TiCalculating a compensation value P by adopting a neighborhood gradient correction mode for each point to be compensated in the region;
Gh=abs(Pleft-Pright)
Gv=abs(Pup-Pdown)
wherein, Pleft,Pright,Pup,PdownThe meanings of (A) are the same as in the above examples.
According to the real-time re-rendering technology provided by the invention, the content of the pressed subtitle shielding area is analyzed by utilizing the video context information, the pressed subtitle is deleted, the detail of subtitle shielding is supplemented, and other subtitles are added for replacement (amplification or replacement), so that a video medium for pressing the subtitle in advance also has the characteristic of hanging the subtitle externally and not influencing watching. The whole calculated amount of the scheme is small, and the scheme is suitable for mobile terminal scenes. The method and the device have the advantages that the elimination of the subtitles of the video with the embedded subtitles is realized, new subtitles can be added subsequently, the calculation amount is small, the processing speed is high, the watching fluency of a user is not affected, the embedded subtitles in the video can be eliminated dynamically at lower cost, meanwhile, certain quality guarantee is realized, and the user can obtain better video watching experience.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
According to another aspect of the embodiments of the present invention, there is also provided an image compensation apparatus for implementing the above-described image compensation method, as shown in fig. 7, the apparatus including:
the acquiring module 72 is configured to acquire a first frame image including embedded subtitles in a video to be processed and a first region where the embedded subtitles are located in the first frame image;
the searching module 74 is configured to search for a second frame image without embedded subtitles in the video to be processed, where a pixel distance between the second frame image and the first frame image meets a target condition;
and a compensation module 76, configured to perform image compensation on the pixel point on the first region by using the pixel point on the second region corresponding to the first region in the second frame image under the condition that the second frame image is found, so as to eliminate the embedded subtitle on the first region.
Optionally, the searching module 74 is further configured to search, in the video to be processed, a target frame image without embedded subtitles from the first frame image forward or backward, where a time interval between the target frame image and the first frame image is smaller than a first target threshold, and a pixel distance between a pixel point of the target frame image and a pixel point of the first frame image is smaller than a second target threshold; and the image processing unit is used for determining the target search frame image as the second frame image under the condition that the target frame image without the embedded subtitles is searched.
Optionally, as shown in fig. 8, the apparatus further includes an updating module 80, configured to perform region segmentation on the characters in the embedded subtitle in the first frame image to obtain a group of character regions when the second frame image is not found; executing the following steps for each first pixel point in each character area, wherein each first pixel point is regarded as a current pixel point when the following steps are executed: acquiring second pixel values of four second pixel points which are closest to the current pixel point in four directions of up, down, left and right in the first frame image, and acquiring a mean value of first pixel values of all the first pixel points in the character area where the current pixel point is located, wherein the second pixel points are not located in the group of character areas; and determining a target pixel value of the current pixel point according to the second pixel values and the average value of the four second pixel points, and updating the pixel value of the current pixel point to the target pixel value.
Optionally, the updating module 80 is further configured to obtain a sum of pixel values of each column of pixel points in the first region; and for determining pixel points on a target column in which the sum of the pixel values in the first region is less than a third target threshold as pixel points on a dividing line between adjacent ones of the characters; and the method is used for determining the group of character areas according to the dividing line and the boundary of the first area, wherein each character area comprises one or more characters.
Optionally, the updating module 80 is further configured to calculate the target pixel value P of the current pixel point by the following formula:
Gh=abs(Pleft-Pright);
Gv=abs(Pup-Pdown);
wherein, A isTiIs the mean value, the PleftThe second pixel value of the pixel point positioned on the left side of the current pixel point in the four second pixel points is PrightThe second pixel value of the pixel point positioned at the right side of the current pixel point in the four second pixel points is PupThe second pixel value of the pixel point positioned above the current pixel point in the four second pixel points is PdownThe abs () is an absolute value function, for the second pixel value of the pixel located below the current pixel among the four second pixels.
Optionally, the obtaining module 72 is further configured to perform the following steps on each frame image in the video to be processed, where each frame image is regarded as a current frame image when the following steps are performed:
the local frame image is also used for cutting off a partial area in the current frame image to obtain a first local frame image;
the image binarization processing module is further used for carrying out image binarization processing on the first local frame image to obtain a second local frame image, wherein pixel points in the second local frame image are black pixel points or white pixel points;
the local frame image acquisition unit is further used for acquiring the sum of the pixel values of each row of pixel points in the second local frame image and the sum of the pixel values of each row of pixel points in the second local frame image;
the image processing device is further configured to search a starting column, an ending column, a starting row and an ending row in the second local frame image, where the starting column is a first target column from left to right in the second local frame image, and a difference between a sum of pixel values of pixels on the first target column and a sum of pixel values of pixels on a column located on the left of and adjacent to the first target column is greater than a fourth target threshold; the termination column is a first second target column from right to left in the second local frame image, and the difference between the sum of pixel values of pixel points on the second target column and the sum of pixel values of pixel points on a column which is located on the right of and adjacent to the target column is greater than the fourth target threshold; the starting line is a first target line from top to bottom in the second local frame image, and the difference value between the sum of the pixel values of the pixel points on the first target line and the sum of the pixel values of the pixel points on the adjacent line which is positioned above the first target line is larger than the fourth target threshold value; the termination line is a first second target line from bottom to top in the second local frame image, and a difference value between a sum of pixel values of pixel points on the second target line and a sum of pixel values of pixel points on a line which is located below and adjacent to the second target line is greater than the fourth target threshold;
and the image processing unit is used for determining the current frame image as a first frame image of the embedded subtitle under the condition that the starting column, the ending column, the starting row and the ending row are found, and determining a region enclosed by the starting column, the ending column, the starting row and the ending row as a first region of the embedded subtitle in the first frame image.
Optionally, the compensation module 76 is further configured to replace the pixel values of the pixels in the first region with the pixel values of the pixels in the second region.
According to another embodiment of the present invention, there is also provided a subtitle positioning apparatus, as shown in fig. 9, including the following structure:
the first processing module 92 is configured to perform image binarization processing on the first frame image to be processed to obtain a processed third image, where pixel points in the third image are black pixel points or white pixel points;
a second processing module 94, configured to obtain a sum of pixel values of each row of pixel points in the third image and a sum of pixel values of each row of pixel points in the third image;
a third processing module 96, configured to detect whether a target case and a target row exist in the third image, where a difference between a sum of pixel values of pixels on the target row and a sum of pixel values of pixels on an adjacent row is greater than a fourth target threshold; the difference between the sum of the pixel values of the pixel points on the target line and the sum of the pixel values of the pixel points on the adjacent line is greater than the fourth target threshold;
the fourth processing module 98 is configured to determine that the first frame image includes embedded subtitles when the target column and the target row are found in the third image, and determine a region surrounded by the target column and the target row as a first subtitle region of the embedded subtitles.
Optionally, after determining the region surrounded by the target column and the target row as the first subtitle region of the embedded subtitle, the fourth processing module 98 is further configured to, when the first frame image belongs to one frame image in the video to be processed, preliminarily determine, for a fourth frame image in the video to be processed, that a subtitle region of the fourth frame image is a fourth position corresponding to the first subtitle region, except for the first frame image; and the fourth position is used for searching the target column or the target row at the fourth position, wherein when the target column or the target row is detected, the fourth position is determined to be a subtitle area of the fourth frame image; and when the target column or the target row is detected to be absent, determining that the fourth frame image does not comprise the subtitles.
Optionally, the third processing module 96 is configured to detect whether the third image has a target case and a target row, and includes: the target column comprises a starting column and a terminating column, the target row comprises a starting row and a terminating row, the starting column is a first target column from left to right in the third image, and the difference value between the sum of pixel values of pixel points on the first target column and the sum of pixel values of pixel points on a column which is positioned at the left side of the first target column and is adjacent to the first target column is greater than a fourth target threshold value; the termination column is a first second target column from right to left in the third image, and the difference between the sum of pixel values of pixel points on the second target column and the sum of pixel values of pixel points on a column which is located on the right of the target column and is adjacent to the target column is greater than the fourth target threshold; the starting line is a first target line from top to bottom in the third image, and the difference between the sum of the pixel values of the pixel points on the first target line and the sum of the pixel values of the pixel points on the adjacent line above the first target line is greater than the fourth target threshold; the termination line is a first second target line from bottom to top in the third image, and a difference between a sum of pixel values of pixel points on the second target line and a sum of pixel values of pixel points on an adjacent line located below the second target line is greater than the fourth target threshold;
the fourth processing module 98 is configured to determine that the first frame image includes embedded subtitles when the target column and the target row are found in the third image, and determine a region enclosed by the target column and the target row as a first subtitle region of the embedded subtitles, including: and under the condition that the starting row, the ending row, the starting row and the ending row are found, determining that the embedded caption is included in the first frame image, and determining a region surrounded by the starting row, the ending row, the starting row and the ending row as a first region of the embedded caption in the first frame image.
According to another embodiment of the present invention, there is also provided a terminal, as shown in fig. 10, the terminal 100 includes the following hardware structure:
an input interface device 1002, configured to receive a request signal for performing subtitle replacement on a video to be processed, and transmit the request signal to a processor 1004, where the request signal is used to request that subtitles to be replaced are displayed in the video to be processed;
the input interface device may be a device for inputting instructions in the related art, such as a mouse, a keyboard, or a voice device. The request signal may be from a user account, for example, if the user account belongs to the elderly, the request signal may be a request to replace the original subtitle with a larger font size. Or the user account belongs to a user who does not know the foreign language, the request signal can be the subtitle of the local language instead of the foreign language subtitle.
The processor 1004 is connected to the input interface device 1002, and is configured to obtain a first frame image of the to-be-processed video, where the embedded subtitle is included, and a first region where the embedded subtitle is located in the first frame image; the video processing device is also used for searching a second frame image without embedded subtitles in the video to be processed, wherein the pixel distance between the second frame image and the first frame image meets the target condition; the embedded subtitle processing unit is further configured to, under the condition that the second frame image is found, perform image compensation on pixel points on the first area by using pixel points on a second area corresponding to the first area in the second frame image, so as to eliminate the embedded subtitle on the first area; and the image compensation module is used for transmitting the subtitle to be replaced and the first frame image subjected to the image compensation to a display;
the terminal of this embodiment may be a user-side terminal or a server-side computer terminal, and the corresponding processor may also belong to a local processor or a processor on a server.
The display 1006 is connected to the processor 1004 and is configured to display the first frame image after the image compensation and the subtitle to be replaced.
By adopting the scheme, the pixel distance between the frame images is calculated, the second frame image without the embedded caption, which is similar to the first frame image with the embedded caption, is obtained, the caption area in the first frame image is covered by the corresponding local area in the second frame image, so that the embedded caption is rapidly and accurately removed in real time, the caption to be replaced is subsequently displayed on the display, the caption replacing speed is greatly improved, the blockage caused by the slow caption replacing speed in the video playing process is avoided, and the technical problem that a large amount of calculation resources are consumed for caption replacing in the related technology is solved.
According to yet another aspect of the embodiments of the present invention, there is also provided an electronic device for implementing the image compensation method, as shown in fig. 11, the electronic device includes a memory 1102 and a processor 1104, the memory 1102 stores therein a computer program, and the processor 1104 is configured to execute the steps in any one of the method embodiments by the computer program.
Optionally, in this embodiment, the electronic apparatus may be located in at least one network device of a plurality of network devices of a computer network.
Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:
s1, acquiring a first frame image including embedded subtitles in a video to be processed and a first area of the embedded subtitles in the first frame image;
s2, searching a second frame image without embedded subtitles in the video to be processed, wherein the pixel distance between the second frame image and the first frame image meets the target condition;
s3, under the condition that the second frame image is found, performing image compensation on the pixel points in the first region by using the pixel points in the second region corresponding to the first region in the second frame image, so as to eliminate the embedded subtitle in the first region.
Alternatively, it can be understood by those skilled in the art that the structure shown in fig. 11 is only an illustration, and the electronic device may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 11 is a diagram illustrating a structure of the electronic device. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 11, or have a different configuration than shown in FIG. 11.
The memory 1102 may be used to store software programs and modules, such as program instructions/modules corresponding to the image compensation method and apparatus in the embodiments of the present invention, and the processor 1104 executes various functional applications and data processing by operating the software programs and modules stored in the memory 1102, so as to implement the image compensation method described above. The memory 1102 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1102 can further include memory located remotely from the processor 1104 and such remote memory can be coupled to the terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. The memory 1102 may be specifically but not limited to information such as a frame image and an area where an embedded subtitle is located. As an example, as shown in fig. 11, the memory 1102 may include, but is not limited to, the obtaining module 72, the searching module 74, and the compensating module 76 of the image compensating apparatus. In addition, other module units in the image compensation apparatus may also be included, but are not limited to these, and are not described in detail in this example.
Optionally, the transmitting device 1106 is used for receiving or transmitting data via a network. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 1106 includes a Network adapter (NIC) that can be connected to a router via a Network cable to communicate with the internet or a local area Network. In one example, the transmission device 1106 is a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
In addition, the electronic device further includes: a display 1108 for displaying the frame image; and a connection bus 1110 for connecting the respective module parts in the above-described electronic apparatus.
According to a further aspect of an embodiment of the present invention, there is also provided a computer-readable storage medium having a computer program stored thereon, wherein the computer program is arranged to perform the steps of any of the above method embodiments when executed.
Alternatively, in the present embodiment, the above-mentioned computer-readable storage medium may be configured to store a computer program for executing the steps of:
s1, acquiring a first frame image including embedded subtitles in a video to be processed and a first area of the embedded subtitles in the first frame image;
s2, searching a second frame image without embedded subtitles in the video to be processed, wherein the pixel distance between the second frame image and the first frame image meets the target condition;
s3, under the condition that the second frame image is found, performing image compensation on the pixel points in the first region by using the pixel points in the second region corresponding to the first region in the second frame image, so as to eliminate the embedded subtitle in the first region.
Alternatively, in this embodiment, a person skilled in the art may understand that all or part of the steps in the methods of the foregoing embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present invention, it should be understood that the disclosed client can be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.