CN110019961A

Movatterモバイル変換

Info

Publication number: CN110019961A
Application number: CN201710736673.7A
Authority: CN
Inventors: 张�杰; 卜海亮; 靳一笑; 邢真臻; 蒋品; 冯新强
Original assignee: Beijing Sogou Technology Development Co Ltd
Current assignee: Beijing Sogou Technology Development Co Ltd
Priority date: 2017-08-24
Filing date: 2017-08-24
Publication date: 2019-07-16
Also published as: WO2019037615A1

Abstract

The embodiment of the invention provides a kind of method for processing video frequency and device, a kind of device for video processing, method therein is specifically included: identifying to the corresponding video flowing of video and/or audio stream, to obtain corresponding recognition result；The target item to match with the recognition result is obtained from pre- placing articles library；By the corresponding target information addition of the target item in the video flowing and/or the corresponding video frame of audio stream.The embodiment of the present invention can shorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.

Description

Method for processing video frequency and device, for the device of video processing

Technical field

The present invention relates to video technique fields, are used for video more particularly to a kind of method for processing video frequency and device, one kindThe device of processing.

Background technique

With the development of internet technology, more and more users' habit watches video, tool by terminals such as computer, mobile phonesBody, user can watch interested view by the player being implanted on the player or webpage of locally-installed clientFrequently.

Information is added in video currently, can handle by video.Existing scheme can be by manual operation in videoMiddle addition information, specifically, operator extract the video for being suitble to addition information after watching video from video firstThen frame obtains the corresponding information of the video frame, be inserted into acquired information in the video frame followed by editing system.

However, existing scheme adds information by manual operation in video, need to spend more time cost and peopleIt is low to will lead to video treatment effeciency in this way for power cost.

Summary of the invention

In view of the above problems, it proposes the embodiment of the present invention and overcomes the above problem or at least partly in order to provide one kindThe method for processing video frequency that solves the above problems, video process apparatus and the device for video processing, the embodiment of the present invention can be withShorten the processing time of video and promote video treatment effeciency, and the video coverage rate of target information can be improved.

To solve the above-mentioned problems, the invention discloses a kind of method for processing video frequency, comprising:

The corresponding video flowing of video and/or audio stream are identified, to obtain corresponding recognition result；

The target item to match with the recognition result is obtained from pre- placing articles library；

By the corresponding target information addition of the target item in the video flowing and/or the corresponding video frame of audio streamIn.

On the other hand, the invention discloses a kind of video process apparatus, comprising:

Identification module, for being identified to the corresponding video flowing of video and/or audio stream, to obtain corresponding identification knotFruit；

Target item obtains module, for obtaining the object to match with the recognition result from pre- placing articles libraryProduct；And

Target information adding module, for by the addition of the target item corresponding target information the video flowing and/Or in the corresponding video frame of audio stream.

Optionally, the identification module includes:

Image recognition submodule, for carrying out image recognition to the corresponding video flowing of video, to obtain corresponding image meshMark information；And/or

Text identification submodule, for carrying out text identification to the corresponding video flowing of video, to obtain corresponding text envelopeBreath；And/or

Speech recognition submodule, for carrying out speech recognition to the corresponding audio stream of video, to obtain corresponding text envelopeBreath.

Optionally, the target item acquisition module includes:

First judging submodule, for judging described image target when the recognition result includes image object informationIt whether include second article identical, similar or generic as the first article in the pre- placing articles library in information, if so,Using first article as the target item to match with the recognition result；And/or

Second judgment submodule, for whether judging the text information when the recognition result includes text informationThe letter to match including characteristic information corresponding with the ware of the first article or the first article in the pre- placing articles libraryBreath, if so, using first article as the target item to match with the text information.

Optionally, first judging submodule includes:

In matching unit, the characteristic information of the second article for including by described image target and the pre- placing articles libraryThe characteristic information of first article is matched, to obtain corresponding matching result；

Target item determination unit, if being successful match for the matching result, it is determined that wrapped in the recognition resultInclude the target item identical, similar or generic as the first article in the pre- placing articles library.

Optionally, the target information adding module, comprising:

Target position determines submodule, for determining in the video flowing and/or the corresponding video frame of audio stream for addingAdd the target position of target information；And

Submodule is added, adds the target information for the target position in the video frame.

Optionally, the target position determines that submodule includes:

Target video frame selecting unit is suitable for addition target information for selecting from the corresponding video frame of the audio streamTarget video frame；And

Target position determination unit, for determining in the target video frame for adding the target position of target information.

Optionally, the target video frame selecting unit includes:

Target identification result obtains subelement, for obtaining the characteristic information in the recognition result with the target itemThe information to match is as target identification result；

Target audio extracts subelement, makees for extracting part corresponding with the target identification result in the audio streamFor target audio；

Target video frame determines subelement, for using the corresponding video frame of the target audio as the target videoFrame.

Optionally, the target position determines that submodule includes:

First object position determination unit, for determining the existing of the video flowing and/or the corresponding video frame of audio streamDegree of conformity between article and the target item；Degree of conformity is obtained from the existing article of the video frame meets prerequisiteArticle position, as target position；And/or

Second target position determination unit is fitted in the video flowing and/or the corresponding video frame of audio stream out for identificationTogether in the prediction picture target area for adding the target information, using the prediction picture target area as the target positionIt sets.

Optionally, the target position is subtitle relevant position；

The addition submodule includes:

Subtitle modifies unit, for modifying according to target information to the subtitle for including in the video frame, in instituteIt states and adds the target information in the subtitle that video frame includes；And/or

Subtitle extra cell, for adding target information as the additional information of subtitle in the video frame in the wordAround curtain, to add the target information in the video frame.

Optionally, the target information adding module includes:

Video frame information modifies submodule, for being corresponded to the video flowing and/or audio stream according to the target informationVideo frame in correspond to the information of target position and modify, to obtain modified video frame；Or

Additional submodule, for using the target information as in the video flowing and/or the corresponding video frame of audio streamThe additional information of corresponding target position.

Optionally, described device further include:

Audio stream modified module modifies to the audio stream for according to the target information, with obtain with it is describedThe modified audio stream that target information matches.

Optionally, the audio stream modified module includes:

Phonetic feature acquisition submodule, for obtaining the corresponding phonetic feature of the audio stream；

Speech synthesis submodule carries out speech synthesis to the target information, to obtain for utilizing the phonetic featureTarget audio；

Submodule is replaced, is matched with the target item in the audio stream for being replaced using the target audioAudio, replaced audio stream is as modified audio stream.

Optionally, described device further include: time shaft alignment module, for the sound before modified audio stream and modificationFrequency stream carries out time shaft alignment.

In another aspect, the invention discloses a kind of device for video processing, include memory and one orMore than one program, perhaps more than one program is stored in memory and is configured to by one or one for one of themIt includes the instruction for performing the following operation that a above processor, which executes the one or more programs:

Another aspect, the invention discloses a kind of machine readable medias, are stored thereon with instruction, when by one or moreWhen managing device execution, so that device executes method for processing video frequency described in aforementioned one or more.

The embodiment of the present invention includes following advantages:

The embodiment of the present invention automatically identifies the corresponding video flowing of video and/or audio stream by machine, obtains pre-The target item to match in placing articles library with recognition result, and by the corresponding target information addition of the target item correspondingVideo frame in；Due to the embodiment of the present invention can in the case where being not necessarily to manual intervention quick obtaining and video flowing and/or soundFrequency flows the target item that corresponding recognition result matches, therefore can shorten the processing time of video and be promoted at videoManage efficiency.

Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit timeThe growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way,The video coverage rate of target information can be improved.

Further, the embodiment of the present invention carries out video processing by the way of image recognition and pre- placing articles storehouse matching, thisSample in the case that the information in the pre- placing articles library changes, can obtain newest mesh based on pre- placing articles storehouse matchingArticle and its corresponding target information are marked, therefore the timeliness for the target information added in the video frame can be improved, or even canTo realize the real-time update of target information to a certain extent.

Detailed description of the invention

Fig. 1 is a kind of step flow chart of method for processing video frequency embodiment one of the invention；

Fig. 2 is a kind of step flow chart of method for processing video frequency embodiment two of the invention；

Fig. 3 is a kind of structural block diagram of video process apparatus embodiment of the invention；

Fig. 4 be a kind of device 900 for video processing of the invention as terminal when structural block diagram；And

Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.

Specific embodiment

In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, with reference to the accompanying drawing and specific realApplying mode, the present invention is described in further detail.

The embodiment of the invention provides a kind of video processing schemes, the program can to the corresponding video flowing of video and/orAudio stream is identified, to obtain corresponding recognition result；Acquisition matches with the recognition result from pre- placing articles libraryTarget item；And by the corresponding target information addition of the target item in the video flowing and/or the corresponding video of audio streamIn frame.

The embodiment of the present invention automatically identifies the corresponding video flowing of video and/or audio stream by machine, obtains pre-The target item to match in placing articles library with recognition result, and the corresponding target information addition of the target item is arrived thisIn video flowing and/or the corresponding video frame of audio stream；Since the embodiment of the present invention can be in the case where being not necessarily to manual intervention fastlySpeed obtains the target item to match with recognition result corresponding to video flowing and/or audio stream, therefore can shorten the place of videoIt manages the time and promotes video treatment effeciency.

Also, in the case where being shortened the video processing time, manageable number of videos can go out in the unit timeThe growth of existing geometry rank, and the machine scale of video can be handled come infinite expanding by way of computing cluster, in this way,It can be further improved video treatment effeciency.

Further, the embodiment of the present invention carries out video processing by the way of image recognition and pre- placing articles storehouse matching, thisSample in the case that the information in the pre- placing articles library changes, can obtain newest mesh based on pre- placing articles storehouse matchingArticle and its corresponding target information are marked, therefore the update cycle of target information can be shortened, such as to a certain extent can be withRealize the real-time update of target information.

Video processing schemes provided in an embodiment of the present invention can be handled for the video from any video platform,And video processing schemes provided in an embodiment of the present invention can play video to offline video or in real time and handle, whereinThe real-time broadcasting video can correspond to the live scenes such as match, party.Wherein, video platform can be for for providing the net of videoNetwork platform, in practical applications, the example of video platform may include: video website and/or video APP (application program,Application) etc..

Referring to Fig.1, a kind of exemplary block diagram of processing system for video of the embodiment of the present invention is shown, which can be withIt include: video server 101, videoconference client 102 and video process apparatus 103；Wherein, video server 101 and video visitorFamily end 102 can be located in wired or wireless network, by the wired or wireless network, video server 101 and video consumerEnd 102 carries out data interaction；Video server 101 can also be counted with video process apparatus 103 by wired or wireless networkAccording to interaction.

In practical applications, video server 101 can provide the first video to videoconference client 102, so that video is objectiveThe first video that family end 102 provides video server 101 plays out；For example, can be according to the broadcasting of videoconference client 102Request or downloading request, provide corresponding first video to videoconference client 102.

Also, video server 101 can provide the second video for needing to add information to video process apparatus 103, thenThe video processing schemes that video process apparatus 103 can use the embodiment of the present invention handle the second video, to be addedAdded with the second video of target information, and the second video for being added with target information is sent to video server 101.

In practical applications, the second video can play for offline video or in real time video；Wherein, it is in the second videoIn the case where offline video, the second video can be current popular video etc., and video server 101 can be filled to video processing103 transmission offline videos are set, the offline video for being added with target information are obtained from video process apparatus 103, and to added with meshSecond video of mark information is stored, in this way, asking in the playing request or downloading for receiving the transmission of videoconference client 102It asks, then it can be with to the first video that videoconference client 102 provides are as follows: playing request or downloading request is corresponding is added with targetSecond video of information.

In the case where the second video is to play video in real time, video server 101 can receive the hair of videoconference client 102The playing request sent, for example, can be carried in the playing request in real time play video URL (uniform resource locator,Uniform Resource Locator) etc. information, then can according to the URL obtain in real time play video, and to video handleDevice 103 is sent plays video in real time, the real-time broadcasting video for being added with target information is obtained from video process apparatus 103, thenThe first video provided to videoconference client 102 can be with are as follows: the real-time broadcasting video added with target information.

It is appreciated that processing system for video shown in Fig. 1 is intended only as the application of the method for processing video frequency of the embodiment of the present inventionThe example of environment, it will be understood that the method for processing video frequency of the embodiment of the present invention can be applied in arbitrary application environment, exampleSuch as, the method for processing video frequency of the embodiment of the present invention can also be applied in the application environment of client, wherein videoconference client102 can use the method for processing video frequency of the embodiment of the present invention, and the first video provided video server 101 is handled,To add target information etc. in the first video, the embodiment of the present invention is without restriction for specific application environment.

Embodiment of the method

Referring to Fig. 2, a kind of step flow chart of method for processing video frequency embodiment of the invention is shown, can specifically includeFollowing steps:

Step 201 identifies the corresponding video flowing of video and/or audio stream, to obtain corresponding recognition result；

Step 202 obtains the target item to match with the recognition result from pre- placing articles library；

It is step 203, the target item corresponding target information addition is corresponding in the video flowing and/or audio streamIn video frame.

The embodiment of the present invention is without restriction for the source of video in step 201.For example, the video can be originated from videoServer may originate from user.Wherein, in the case where the video source is from video server, which can be offline viewFrequency plays video in real time.In the case where the video source is from user, for example, can by way of website or APP toUser, which provides, uploads interface, and the video that user is uploaded by the upload interface is as video in step 201.

Video is usually made of static picture, these static pictures are referred to as video frame.The corresponding video flowing of videoIt can be used for indicating continuous video frame.The corresponding audio stream of video can be used for indicating continuous audio signal, the audio stream and companyContinuous video frame has synchronism, and effect is played simultaneously with realize video pictures and audio.

In practical applications, the corresponding audio stream of video can be corresponding to the lines of video, the video contents such as dub in background music, this is matchedPleasure may include: theme song, interlude, piece caudal flexure and the corresponding background music of lines etc..It is appreciated that the embodiment of the present inventionSpecific video content corresponding for audio stream is without restriction.

In practical applications, the corresponding video flowing of video and audio stream can be located in identical file, in such cases,Audio can be extracted from video file, specifically, video file can be converted to audio file, such as can be by MP4(dynamic image expert's compression standard audio level 4, Moving Picture Experts Group Audio Layer 4) latticeThe video file of formula is converted to MP3 (dynamic image expert's compression standard audio level 3, Moving Picture ExpertsGroup Audio Layer III) format audio file etc..Alternatively, the corresponding video flowing of video and audio stream can be distinguishedIn independent file, that is, video file and audio file can be independent, in such cases, it can directly acquireAudio file.It may include the corresponding audio stream of video in above-mentioned audio file, therefore view can be read from above-mentioned audio fileFrequently corresponding audio stream.

In practical applications, several video frames, the view extracted can be extracted from video according to preset time intervalFrequency frame can be used as the object of image recognition.It is appreciated that those skilled in the art can be according to practical application request, in determinationPreset time interval is stated, for example, above-mentioned preset time interval can be the corresponding playing duration of N number of video frame, N is positive integer,It is appreciated that the embodiment of the present invention is without restriction for specific N and preset time interval.

The embodiment of the present invention can know the corresponding video flowing of video and/or audio stream using following identification methodOther:

Identification method 1 carries out image recognition to the corresponding video flowing of video, to obtain corresponding image object information；With/Or

Identification method 2 carries out text identification to the corresponding video flowing of video, to obtain corresponding text information；And/or

Identification method 3 carries out speech recognition to the corresponding audio stream of video, to obtain corresponding text information.

In identification method 1, image recognition refers to and is handled image, analyzed and understood using machine, various to identifyThe technology of the image object of different mode.Specific to the embodiment of the present invention, it can use machine and video frame handled, is analyzedAnd understanding, to identify the technology of the image object of various different modes, wherein the image object in usual video frame can regardIt is corresponding with certain image-region in frequency frame, the image object in video frame may include: article, personage, space etc., for example,Personage can be personage in video frame, and article can be the article of personage's wearing in video frame, and space can be people in video frameEnvironment space locating for object, such as outdoor environment, indoor environment, for example, indoor environment may include indoor wall, ground etc.Information, it will be understood that the embodiment of the present invention is without restriction for the specific image object in video frame.

In an alternative embodiment of the invention, image knowledge is carried out to video flowing and/or the corresponding video frame of audio streamOther process may include: the image object detected in video frame, and using deep learning method to the image object gotIt is analyzed, to obtain corresponding image object information, therefore, the recognition result of the embodiment of the present invention may include: video frameCorresponding image object information.Above-mentioned image object information may include: that (namely image object is in video for the image of image objectImage in frame, the image object are usually corresponding with certain closed area in the video frame), the recognition result of image object(title, the classification information of the image object that such as identification obtains).For example, can use in human face detection tech detection video frameFace, and face is analyzed using deep learning method, with information such as gender, ages for obtaining personage, or even may be used alsoTo obtain the source of personage, such as it is originated from which movie and television play, or even can also obtain which famous person personage is.Further, may be used alsoTo detect the article of personage wearing, such as clothes, shoes, the wrist-watch of wearing, jewellery.Alternatively, the personage institute can also be detectedThe spatial information etc. at place.

Text information in video frame may include: the text envelope in the text information, and/or subtitle for including in imageBreath.

For identification method 2, video flowing and/or the corresponding video frame of audio stream can be carried out using text recognition techniqueText identification.Above-mentioned text recognition technique may include: OCR (optical character identification, Optical CharacterRecognition) technology etc., OCR technique can cut the character in image after carrying out the pretreatment such as noise reduction to imagePoint, to obtain single character picture, and identify the corresponding character of single character picture.It is appreciated that the embodiment of the present invention pairIt is without restriction in specific text recognition technique.

For identification method 2, the corresponding subtitle file of the subtitle of available video frame, and obtained from the subtitle fileText information in subtitle；Alternatively, screenshotss can be carried out to the corresponding picture of video frame, and text knowledge is carried out to screenshotss imageNot, to obtain the text information in subtitle.It is appreciated that specific acquisition of the embodiment of the present invention for the text information in subtitleMode is without restriction.

For identification method 3, the corresponding audio stream of video can be converted to by text information using speech recognition technology.Such asThe corresponding audio stream of video is denoted as S by fruit, obtains corresponding phonetic feature sequence O after carrying out a series of processing to S, is rememberedMake O={ O₁, O₂..., O_i..., O_T, wherein O_iIt is i-th of phonetic feature, T is phonetic feature total number.Audio stream S is correspondingSentence is considered as a word string being made of many words, is denoted as W={ w₁, w₂..., w_n}.The process of speech recognition is exactly basisKnown phonetic feature sequence O finds out most probable word string W.

Specifically, speech recognition is the process of a Model Matching, in this process, can be first according to the language of peopleSound feature establishes speech model, by the analysis of the voice signal to input, extracts required feature, Lai Jianli speech recognition instituteThe template needed；The process that voice inputted to user is identified is by the feature of the inputted voice of user and the template ratioCompared with process, the finally determining optimal Template with the inputted voice match of the user, to obtain the result of speech recognition.ToolThe speech recognition algorithm of body can be used the training and recognizer of the hidden Markov model based on statistics, base can also be usedIn the training of neural network and recognizer, based on the matched recognizer of dynamic time consolidation etc. other algorithms, the present inventionEmbodiment is without restriction for specific speech recognition process.

After step 201 obtains video flowing and/or the corresponding recognition result of audio stream, step 202 can be from pre- gloveThe target item to match with the recognition result is obtained in product library.

Wherein, pre- placing articles library can be used for storing the first article, also, first article can also be corresponding with characteristic informationAnd target information.In practical applications, it can cooperate with operator, to obtain the first article and its corresponding characteristic informationAnd target information.

Wherein, the characteristic information of the first article is used to characterize the article characteristics of the first article, can be used as and text envelopeBreath carries out matched matching foundation.

Target information is the information for adding in the video frame；For example, target information can for the first article logo,Picture etc. attracts the information of user, and for another example, target information can be the access entrances such as link, so that user passes through the access entranceInto the corresponding page of the first article.

The example of first article may include: the commodity such as clothes, shoes, beverage, adornment, and target information may include:Target information and/or the target information of text formatting of the picture formats such as logo, display diagram, poster etc., it will be understood that operatorIt can determine that the first article recommended and its corresponding target information, the present invention are implemented according to practical application requestExample is without restriction for specific first article and its corresponding target information.

Additionally, it is appreciated that providing the first article and its corresponding characteristic information and target information above by operatorMode be intended only as alternative embodiment, in fact, those skilled in the art can be according to practical application request, using its other partyFormula obtains the first article and its corresponding characteristic information and target information, for example, according to the historical behavior data acquisition of user theOne article etc. specifically can be according to the feature of interest of the historical behavior data acquisition user of user, and it is emerging to obtain the senseCorresponding first article of interesting feature, for example, the feature of interest can be the product features that user bought, which canThink similar another characteristic of the product features etc., it will be understood that the embodiment of the present invention is for the first article and its corresponding targetThe specific acquisition modes of information are without restriction.

In an alternative embodiment of the invention, above-mentioned recognition result includes: image object information, above-mentioned steps 202The process that the target item to match with the recognition result is obtained from pre- placing articles library may include: to judge described image meshWhether mark in information includes second article identical, similar or generic as the first article in the pre- placing articles library, if so,Then using first article as the target item to match with described image target information.The embodiment of the present invention can will be with figureAs the first identical or generic article of the second article for including in target information, as target item, therefore can be improvedThe video coverage rate of target information.For example, " cap 1 " that includes in image object information and the " cap for including in pre- placing articles librarySon 2 " is identical；For another example, " Western-style clothes 1 " for including in image object information and " Western-style clothes 2 " that includes in pre- placing articles library are similar；AgainSuch as, the article for including in pre- placing articles library is " cola ", and article is " Sprite ", " cola " and " Sprite " institute in image object informationThe classification of category is the beverage etc. of pop can shape.

Specifically, it is above-mentioned judge in described image target information whether include and the first article phase in the pre- placing articles libraryThe process of the second same, similar or generic article may include: the second article that will include in described image target informationCharacteristic information matched with the characteristic information of the first article in the pre- placing articles library, to obtain corresponding matching result；If the matching result is successful match, it is determined that include in described image target information and the first object in the pre- placing articles librarySame, the similar or generic target item of condition；Wherein, the characteristic information may include: in shape, color and classificationIt is at least one.

In practical applications, the profile for the second article that can include according to image object information determines the shape of the second articleShape；And/or the second article can be determined according to the color-values (such as RGB (RGB, Red Green Blue) value) of the second articleColor；And/or the second article is analyzed using deep learning method, to obtain the classification of the second article.

Optionally, the in the characteristic information for the second article for including by described image target information and the pre- placing articles libraryThe characteristic information of one article carries out the spy that matched process may include: the second article that determining described image target information includesSimilarity in reference breath and the pre- placing articles library between the characteristic information of the first article, and judge whether the similarity meetsPreset similarity condition, if so, corresponding matching result can be successful match.

For example, the first object in the shape and color of the second article that can include by image object information and pre- placing articles libraryThe shape and color of product are matched, if successful match, it may be considered that first article matches with second article.ExampleSuch as, if the shape and color of the clothes that the corresponding image object information of the video frame of certain TV play includes are respectively " Western-style clothes shape1 " and " claret ", and the shape and color of the first article for including in certain pre- placing articles library are respectively " Western-style clothes shape 2 " and " jujubeIt is red ", it may be considered that the clothes that image object information includes and the first article successful match.It is appreciated that the present inventionEmbodiment is without restriction for specific preset similarity condition, for example, preset similarity condition may include: that similarity is superSimilarity threshold is crossed, which can wait the positive number no more than 1 for 0.8.

In another alternative embodiment of the invention, above-mentioned recognition result includes: text information, above-mentioned steps 202 fromThe process that the target item to match with the recognition result is obtained in pre- placing articles library may include: to judge the text informationIt whether include that characteristic information corresponding with the ware of the first article or the first article in the pre- placing articles library matchesInformation, if so, using first article as the target item to match with the text information.

Optionally, the characteristic information may include: at least one of title, brand, classification and advertising slogan.Text informationAnd characteristic information match may include: all or part of text information character corresponding with characteristic information it is identical, it is semantic it is identical,Semantic similar, semantic correlation etc..It is alternatively possible to determine text information and the corresponding text vector of characteristic information respectively, and rootSemantic similar judgement is carried out according to the similarity between two text vectors, it will be understood that the embodiment of the present invention is for text envelopeBreath matches with characteristic information and its corresponding matching process is without restriction.

In a kind of application example 1 of the invention, it is assumed that the corresponding subtitle of video frame includes that text information " has me to most likeThree squirrels ", then can be special by text information title corresponding with the first article in pre- placing articles library, brand, classification etc.Reference breath is matched, since text information includes the information that characteristic information corresponding with the first article matches, therefore can be withThe target item that brand is " three squirrels " is obtained, the target item that brand is " non-defective unit shop " can also be obtained, wherein " goodProduct shop " is identical as the classification of " three squirrels ".

In a kind of application example 2 of the invention, it is assumed that the corresponding subtitle of video frame includes that " I thought one to text informationExcellent life " can then match text information advertising slogan information corresponding with the first article in pre- placing articles library,Assuming that matching result shows: the advertising slogan of text information and certain beverage " youth will wake spelling " matches, then can willThe beverage is as target item.

In a kind of application example 3 of the invention, it is assumed that include in the corresponding image of video frame text information " GAP ",Personage's wearing i.e. in image has the article (such as clothes, cap, school bag) with " GAP " logo, then can believe the textIt ceases the characteristic informations such as title corresponding with the first article in pre- placing articles library, brand, classification to be matched, due to text informationIncluding the information that characteristic information corresponding with the first article matches, therefore available brand is the target item of " GAP ", may be used alsoTo obtain the target item that brand is " excellent clothing library ", wherein " excellent clothing library " is same or similar with the classification of " GAP ".

In step 202 after the target item that acquisition matches with the recognition result in pre- placing articles library, step 203The corresponding target information of the target item can be added in the video flowing and/or the corresponding video frame of audio stream, withJust when subsequent user watches the video, when video progress to the video frame, target information is showed into user.

In an alternative embodiment of the invention, above-mentioned steps 203 add the corresponding target information of the target itemThe process being added in the video flowing and/or the corresponding video frame of audio stream may include: in the determining video frame for addingAdd the target position of the target information；Add the target information in target position in the video frame.

In practical applications, video frame can be analyzed, is suitable for adding mesh to obtain from the position of video frameMark the target position of information.

In the embodiment of the present application, the corresponding video frame of audio stream can be one or more.It in practical applications, can be withIt, can also be only by target item by the corresponding target information addition of target item in the corresponding all videos frame of the audio streamCorresponding target information addition is in the corresponding partial video frame of the audio stream.It is alternatively possible to first from the audio streamSelection is suitable for adding the target video frame of target information in corresponding video frame, then believes the corresponding target of the target itemBreath addition is in the target video frame.It is alternatively possible to which video frame corresponding with the text information that target item matches is madeFor target video frame, in this manner it is achieved that video pictures are synchronous with target information.For example, the text to match with target itemThis information is the information of certain section of lines in video, then can believe the corresponding video frame of this section of lines as addition target is suitable forThe target video frame of breath.Certainly, the embodiment of the present invention is without restriction for specific target video frame, for example, it can be withFor the video frame etc. after video frame corresponding with the text information that target item matches, it is assumed that with object conditionThe text information matched is located at the end of certain section of lines in video, then can be using the corresponding next video frame of this section of lines as targetVideo frame.

In an alternative embodiment of the invention, above-mentioned steps 203 add the corresponding target information of the target itemThe process being added in the corresponding video frame of the audio stream may include: to select to fit from the corresponding video frame of the audio streamIn the target video frame of addition target information；It determines in the target video frame for adding the target position of the target informationIt sets；Add the target information in target position in the target video frame.

Wherein, the target video frame may include: video frame corresponding with the text information that target item matches.ToolBody, the selection from the audio stream corresponding video frame is suitable for adding the target video frame of the target information, can be withIt include: to obtain the information to match in the recognition result with the characteristic information of the target item as target identification result；Part corresponding with the target identification result is extracted in the audio stream as target audio；The target audio is correspondingVideo frame is as the target video frame；The recognition result is the text envelope obtained to the audio stream by speech recognitionBreath.In practical applications, audio stream can have certain length, and the text information as recognition result also can have centainlyLength, therefore the characteristic information that can be first depending on target item obtains target identification as a result, such as the target text in text informationThen this information extracts the target audio in audio stream, and then navigates to the corresponding target video frame of target audio, wherein canTo navigate to the corresponding target video frame of target audio according to the synchronism between video flowing and audio stream.

It should be noted that each target video frame can be directed to respectively when target video frame is multiple, determine whereinFor adding the target position of the target information；In this way, can avoid a target video frame corresponding to a certain extentDuration compared with short-range missile apply family miss target information the problem of.

In an alternative embodiment of the invention, the target position can be subtitle relevant position.Subtitle relevant bitsSet may include: subtitle position or subtitle peripheral location.It wherein, can be according to mesh when target position is subtitle positionMark information modifies to the subtitle for including in the video frame, to add the target letter in the subtitle that the video frame includesBreath.Alternatively, when target position is the peripheral location of subtitle, can using target information as in the video frame subtitle it is additionalInformation is added around the subtitle.

In an alternative embodiment of the invention, the target position can be consistent with the target item, in this way, canTo improve the naturalness of video.Correspondingly, for adding the target position of the target information in the above-mentioned determination video frameProcess may include: degree of conformity between the existing article and the target item of the determining video frame；From the videoThe position that degree of conformity meets the article of prerequisite is obtained in the existing article of frame, as target position.

Wherein, existing article can be the article for including in video frame, in practical applications, can be by the existing of video frameThe characteristic information (such as shape, color, title, classification) of article and the target item characteristic information (such as shape, color,Title, classification, brand and target information etc.) it is matched, to obtain degree of conformity between the two.Further, if the degree of conformityMeet prerequisite, then this can be had to position of the article in the video frame as target position.Optionally, degree of conformity accords withClosing prerequisite may include: degree of conformity more than preset threshold etc..For example, if target item " cola " is the drink of pop can shapeMaterial, then shape is the position where the article of pop can shape or ampuliform in available video frame according to image analysisDeng as target position.For another example, if the target information of target item be certain brand (as " GAP ") logo, then availablePosition etc. where the article of the clothes or shoes and hats that are consistent in video frame with the logo, as target position, for example, such as withThe style of clothes or shoes and hats that the logo of " GAP " is consistent can be Casual Style corresponding with " GAP ", it will be understood that the meshCursor position can position where the article that is consistent in video frame with the logo in the target position of the embodiment of the present inventionWithin protection scope, wherein the position where article is consistent with the logo can refer to the position addition being suitable for where the articleThe logo.

In another alternative embodiment of the invention, the target position can be corresponding for prediction picture target areaPosition, the prediction picture target can for do not influence user viewing image object, the prediction picture target may include: in addition toImage object except the article that personage, personage dress, the prediction picture target can be the skies such as wall, ground, elevator, blue skyBetween, which can also be furniture and other items etc..Correspondingly, described for adding in the above-mentioned determination video frameThe process of the target position of target information may include: to identify the preset figure for being suitable for adding the target information in video frameAs target area, using the prediction picture target area as the target position.

In a kind of application example of the invention, it is assumed that there are the prediction picture target areas of large area in certain video frame(such as wall area, ground region, elevator region or wardrobe region) then can identify that this is pre- by image recognition technologyImage target area is set, and is inserted into target information (such as poster information, display diagram) in the prediction picture target area.UsuallyFor watching for the user of video, it is interior other than video for will not perceiving the content of prediction picture target area substantiallyHold, thus can reduce influence of the target information to video and user for target information dislike degree while, realityThe recommendation of existing target information.

In practical applications, above-mentioned steps 203 are by the corresponding target information addition of the target item in the video flowingAnd/or addition manner employed in the corresponding video frame of audio stream may include:

Addition manner 1, according to the target information, modify to the information for corresponding to target position in the video frame,To obtain the modified video frame including the target information；Or

Addition manner 2 is added into the target information as the additional information for corresponding to target position in the video frameThe video frame.

Wherein, addition manner 1 can be by modifying to the information for corresponding to target position in video frame, by target informationIt is added to the video frame, the information in video frame can be made to change in this way.

According to a kind of embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame can be withInclude: to modify to the pixel value for corresponding to target position in video frame, specifically, can will correspond to target in the video frameFirst pixel value of position replaces with corresponding second pixel value of target information, wherein can believe according to the target of picture formatThe color-values (such as RGB (RGB, Red Green Blue) value) of the target information of breath and/or text formatting determine that target is believedCease corresponding second pixel value.

According to another embodiment, the above-mentioned process modified to the information for corresponding to target position in the video frame canTo include: to modify to the text information for corresponding to subtitle position in video frame, the text information for corresponding to subtitle position is repairedIt is changed to the target information of text formatting.

Addition manner 2 can using the target information as the additional information for corresponding to target position in the video frame,In, which may include caption information or mask information.

Wherein it is possible to using the target information of text formatting as the caption information for corresponding to target position in video frame, for example,The personage of video frame is installed with clothes, then can regard the corresponding target information of target item (such as apparel brand A) as the clothes pairThe caption information of position is answered, to realize the recommendation of apparel brand A.It should be noted that if the clothes that personage wears in video frameWith brand, then the brand that can be had the clothes that the personage of the video frame wears by image processing techniques removes, to avoidThe repetition of brand.

Mask refers to that the figure layer with certain transparent value, the parameter of mask may include size, display position and transparent value.Mask in the embodiment of the present invention can be covered in video frame, in this way, can realize mask and video by the parameter of maskIt is shown while frame.For example, can be while frame of display video, target position in the video frame shows the mesh by maskMark information.Also, in order to reduce influence of the mask for video frame, which can be located at where prediction picture target above-mentionedThe band of position.

The embodiment of the present invention is by the corresponding target information addition of the target item in the video flowing and/or audio stream pairApplication example in the video frame answered may include:

Using example 1, assume that the lines of video include text information " have my favorite three squirrels ", it is assumed that byWith the target item that brand is " non-defective unit shop " is obtained, then the text information for including in the subtitle of the video frame " there can be me" three squirrels " in favorite three squirrels " replaces with " non-defective unit shop ", and obtaining modified caption information is " to have me mostThe non-defective unit shop liked ", and be presented in the video frame after addition.

Using example 2, assume that the lines of video frame include text information " I thought an excellent life ", it is assumed that this articleThe advertising slogan " youth will wake spelling " of this information and certain beverage matches, then can using the beverage as target item, andMask is set in the peripheral region (such as upper area) of subtitle, the corresponding target information of target item is loaded by the mask, such asThe logo of beverage and advertising slogan etc., and the mask is presented in the video frame after addition.

Using example 3, assume that personage's wearing in the corresponding image of video frame has article (such as clothing with " GAP " logoClothes, cap, school bag etc.), it is assumed that obtain the target item that brand is " excellent clothing library " by matching, then it can be in the video frameThe logo (such as the logo UNIQLO in excellent clothing library) that target item is added on target position is corresponded in image, or by the videoThe logo that the logo of the second article replaces with target item in frame (such as replaces with the logo " GAP " in video frame on dress ornament"UNIQLO").Wherein it is possible to realize the addition of the logo of target item by the modification or mask of pixel value or replaceIt changes.Also, target position can be consistent with the logo of target item, and specifically, which can cover any type of itemsArticle position etc., for example, the type of items of excellent clothing library logo " UNIQLO " covering may include: clothes, cap etc..

Target target information pair is replaced with using example 4, by the first pixel value for corresponding to target position in the video frameThe second pixel value answered.For example, the first pixel value that can include by corresponding first image of the second article of certain in video frame is replacedThe second pixel value for including for the second image corresponding with the generic target item of second article.The example of second article can be withIt include: the first beverage of pop can shape or ampuliform, generic target item may include: pop can with second articleThe picture of first beverage in video frame can be replaced with the picture of the second beverage by the second beverage of shape or ampuliform in this way.

It using example 5, corresponds in the video frame and to add the logo of target item on target position, or by the viewThe logo of the second article replaces with the logo of target item in frequency frame.Wherein it is possible to by the modification or mask of pixel value comeRealize the addition or replacement of the logo of target item.Also, target position can be consistent with the logo of target item, for example,The logo of target item is the logo of certain brand, then the target position can be that the position suitable for adding the logo specifically shouldLogo can cover the article position etc. of any type of items, for example, the type of items of logo " GAP " covering may include: clothesThe type of items of dress, cap etc., logo " NIKE " covering may include: clothes, shoes and hats, luggage etc..

It corresponds on target position using example 6, in the video frame through the corresponding target of mask displaying target articleInformation, such as logo, display diagram, the target information of poster picture format and/or the target information of text formatting etc., pass through maskThe target information of display can be with link, so that user is linked into the corresponding page of target item by this.

In some embodiments of the invention, figure can also be carried out to the image object in the successive video frames that video includesAs tracking, in this way, the image object in subsequent video frame, the video frame before being multiplexed can be directed to according to image trace resultThe corresponding target item of middle identical image target, operand needed for the acquisition of target item not only can be reduced, andThe multiple appearance of target item can deepen memory of the user for target item.For example, (i is the volume of video frame to video frame iNumber, i is the integer more than or equal to 0) there is the beverage 1 of pop can shape, which is and beverage 1The beverage 2 of generic pop can shape then can carry out picture charge pattern to the beverage 1, if subsequent video frame i+1, videoStill there is beverage 1 in frame i+2 ... video frame i+M (wherein, M is positive integer), then it can be for subsequent video frame i+1, viewThe beverage 1 for including in frequency frame i+2 ... video frame i+M is multiplexed the corresponding target information of beverage 2, until recognizing video frame i+M+1In the beverage 1 disappear until so that, when video progress to the video frame for implanting target information, user be can seeThe target information that joined beverage 2, until the beverage 1 is no longer shown.

In some embodiments of the invention, it can be handled for video is played in real time.Correspondingly, it can be directed to and work asCorresponding first video frame of preceding playing time obtains corresponding first object article, and in corresponding second view of next playing timeThe corresponding target information of the first object article is added in frequency frame, wherein the corresponding recognition result of the second video frame can be withFirst object article matches.

It should be noted that identical image target is corresponding in the case where successive video frames include identical image targetTarget item can be corresponding with multiple target informations, in this way, can add the target in the different video frame of successive video framesThe corresponding different target information of article, may be implemented the diversity that target item corresponds to target information in this way.For example, the objectThe corresponding different target information of product may include: the corresponding logo of same target item, display diagram, poster, even text informationDeng.

In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: according to the targetInformation modifies to the audio stream, to obtain the modified audio stream to match with the target information.Wherein, it repairsIt may include the audio to match with target information in audio stream after changing, for example, it is assumed that the lines of video include text information" have my favorite three squirrels ", it is assumed that target item is " non-defective unit shop ", then can be by the corresponding audio modification of the linesFor " have my favorite non-defective unit shop " corresponding audio.

According to a kind of embodiment, speech synthesis can be carried out to the target information, to obtain target audio；Using describedTarget audio replaces the audio to match in the audio stream with the target item, and replaced audio stream is as modifiedAudio stream.

Speech synthesis technique is also known as literary periodicals (TTS, Text-to-Speech) technology, i.e., is voice by text conversionTechnology.The example of speech synthesis technique may include: based on hidden Markov model (HMM, Hidden Markov Model)Speech synthesis (HTS, HMM-based Speech Synthesis System), the basic ideas of HTS are: to voice signal intoRow parametrization is decomposed, and establishes the corresponding HMM model of each parameters,acoustic, the HMM model prediction obtained using training when synthesis toThe parameters,acoustic of synthesis text, these parameters,acoustics are input to Parametric synthesizers, finally obtain synthesis voice.Above-mentioned acoustics ginsengNumber may include: at least one of frequency spectrum parameter and base frequency parameters.

According to another embodiment, the above-mentioned process modified to the audio stream may include: to obtain the audioFlow corresponding phonetic feature；Using the phonetic feature, speech synthesis is carried out to the target information, to obtain target audio；The audio to match in the audio stream with the target item, replaced audio stream conduct are replaced using the target audioModified audio stream.In the present embodiment, the phonetic feature can use, determine the corresponding parameters,acoustic of speech synthesis, thisThe audio not being replaced in audio stream and consistency of the replaced audio in terms of phonetic feature may be implemented in sample.

Optionally, above-mentioned phonetic feature may include vocal print feature, and vocal print feature is the carrying that electricity consumption acoustic instrument is shownThe sound wave spectrum of verbal information, vocal print not only has specificity, but also has the characteristics of relative stability.The embodiment of the present invention utilizesThe corresponding vocal print feature of audio stream carries out the speech synthesis of target information, the target audio that synthesis can be made to obtain and audio stream pairThe primary sound answered matches, and realizes the integrality of video content.

In an alternative embodiment of the invention, when can be carried out to the audio stream before modified audio stream and modificationBetween axis be aligned, the alignment of above-mentioned time shaft may be implemented the audio stream (raw audio streams) before modified audio stream and modification whenBetween consistency in terms of axis, influence that in this way can be synchronous for video/audio to avoid the modification of audio stream.Assuming that original audioCorresponding with text information " have my favorite three squirrels " in stream is the first audio, it is assumed that in modified audio stream with repairIt is the second audio that it is corresponding, which to change rear text information " have my favorite non-defective unit shop ", then the first audio is in raw audio streamsTemporal information in the audio stream of temporal information and the second audio after the modification is consistent；Specifically, the first audio and secondThe corresponding duration of audio can be consistent, also, initial time of first audio in raw audio streams and terminate the time withInitial time and termination time in the audio stream of second audio after the modification are consistent.

It should be noted that can recorde text information and mesh after obtaining the target item to match with text informationMark the mapping relations between article, in this way, text information corresponding for audio stream, can by the mapping relations, obtain withThe target item that text information matches.Operand needed for the acquisition of target item not only can be reduced, and targetThe multiple appearance of article can deepen memory of the user for target item.For example, if repeatedly going out in the corresponding lines of audio streamExisting " three squirrels " can establish " three pines then after obtaining " three squirrels " corresponding target item " non-defective unit shop " for the first timeMapping relations between mouse " and " non-defective unit shop "；In this way, " three squirrels " of subsequent appearance can be directed to, closed by the mappingSystem obtains matched target item " non-defective unit shop ".

In an alternative embodiment of the invention, the method for the embodiment of the present invention can also include: to obtain locating for equipmentThe corresponding object language in geographic area and the geographic area；It is translated as the corresponding text information of audio stream to meet instituteState the target text information of object language；The target text information is added corresponding in the video flowing and/or audio streamIn video frame.Wherein, equipment can be equipment used by a user, and the embodiment of the present invention can be for geographic region locating for userThe corresponding text information of audio stream (such as lines, the lyrics) are carried out machine translation, different language user may be implemented in this way by domainThe purpose of video content can be understood.The granularity of above-mentioned geographic area can be country etc., in this way, in American-European regionUser, the corresponding text information of audio stream can be translated as English from a kind of language (such as Chinese).Certainly, above-mentioned geographic regionThe granularity in domain can also be provinces and cities etc., in this way, the corresponding text information of audio stream can be translated from a kind of language (such as Chinese)For the dialect (such as northeast dialect, Sichuan dialect, Guangdong dialect) in some region.

To sum up, the method for processing video frequency of the embodiment of the present invention is obtained pre- by the information in machine automatic identification video frameThe target item to match in placing articles library with recognition result, and view is arrived into the corresponding target information addition of the target itemIn frequency frame；Due to the embodiment of the present invention can in the case where being not necessarily to manual intervention quick obtaining and video flowing and/or audio streamThe target item that corresponding recognition result matches, therefore the processing time of video can be shortened and promote video processing effectRate.

It should be noted that for simple description, therefore, it is stated as a series of movement is dynamic for embodiment of the methodIt combines, but those skilled in the art should understand that, the embodiment of the present invention is not by the limit of described athletic performance sequenceSystem, because according to an embodiment of the present invention, some steps may be performed in other sequences or simultaneously.Secondly, art technologyPersonnel also should be aware of, and the embodiments described in the specification are all preferred embodiments, and related athletic performance is simultaneously differentIt surely is necessary to the embodiment of the present invention.

Installation practice

Referring to Fig. 3, a kind of structural block diagram of video process apparatus embodiment of the invention is shown, can specifically include:Identification module 301, target item obtain module 302 and target information adding module 303.

Wherein, identification module 301, for being identified to the corresponding video flowing of video and/or audio stream, to be corresponded toRecognition result；

Target item obtains module 302, for obtaining the target to match with the recognition result from pre- placing articles libraryArticle；

Target information adding module 303, for adding the corresponding target information of the target item in the video flowingAnd/or in the corresponding video frame of audio stream.

Optionally, the identification module 301 may include:

Optionally, the target item acquisition module 302 may include:

First judging submodule, for judging described image when the recognition result may include image object informationIt whether may include second object identical, similar or generic as the first article in the pre- placing articles library in target informationProduct, if so, using first article as the target item to match with the recognition result；And/or

Second judgment submodule, for judging the text information when the recognition result may include text informationIt whether may include characteristic information phase corresponding with the ware of the first article or the first article in the pre- placing articles libraryMatched information, if so, using first article as the target item to match with the text information.

Optionally, first judging submodule may include:

Matching unit, the characteristic information of the second article for may include by described image target and the pre- placing articlesThe characteristic information of the first article is matched in library, to obtain corresponding matching result；

Target item determination unit, if being successful match for the matching result, it is determined that can in the recognition resultTo include the target item identical, similar or generic as the first article in the pre- placing articles library.

Optionally, the target information adding module 303 may include:

Optionally, the target position determines that submodule may include:

Optionally, the target video frame selecting unit may include:

Optionally, the target position determines that submodule may include:

Optionally, the target position is subtitle relevant position；

The addition submodule may include:

Subtitle modifies unit, for modifying according to target information to the subtitle that may include in the video frame, withThe target information is added in the subtitle that the video frame may include；And/or

Optionally, the target information adding module 303 may include:

Optionally, described device can also include:

Optionally, the audio stream modified module may include:

Optionally, described device can also include: time shaft alignment module, before to modified audio stream and modificationAudio stream carry out time shaft alignment.

For device embodiment, since it is basically similar to the method embodiment, related so being described relatively simplePlace illustrates referring to the part of embodiment of the method.

All the embodiments in this specification are described in a progressive manner, the highlights of each of the examples are withThe difference of other embodiments, the same or similar parts between the embodiments can be referred to each other.

About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this methodEmbodiment in be described in detail, no detailed explanation will be given here.

The embodiment of the invention provides a kind of devices for video processing, the apparatus may include there is memory, andOne perhaps more than one program one of them or more than one program be stored in memory, and be configured to by oneIt includes the instruction for performing the following operation that a or more than one processor, which executes the one or more programs: rightThe corresponding video flowing of video and/or audio stream are identified, to obtain corresponding recognition result；From pre- placing articles library obtain withThe target item that the recognition result matches；By the addition of the target item corresponding target information the video flowing and/Or in the corresponding video frame of audio stream.

Optionally, described that the corresponding video flowing of video and/or audio stream are identified, comprising: view corresponding to videoFrequency stream carries out image recognition, to obtain corresponding image object information；And/or text knowledge is carried out to the corresponding video flowing of videoNot, to obtain corresponding text information；And/or speech recognition is carried out to the corresponding audio stream of video, to obtain corresponding textInformation.

Optionally, described that the target item to match with the recognition result is obtained from pre- placing articles library, comprising: describedRecognition result includes image object information, judge in described image target information whether include with first in the pre- placing articles libraryThe second identical, similar or generic article of article, if so, using first article as with the recognition result phaseThe target item matched；And/or the recognition result includes text information, judge the text information whether include with it is described presetThe information that the corresponding characteristic information of the ware of the first article or the first article matches in article library, if so, by instituteThe first article is stated as the target item to match with the text information.

Optionally, it is described judge in described image target whether include it is identical as the first article in the pre- placing articles library,The second similar or generic article, comprising: the characteristic information for the second article for including by described image target with it is described pre-The characteristic information of the first article is matched in placing articles library, to obtain corresponding matching result；If the matching result isWith success, it is determined that include identical, similar or generic with the first article in the pre- placing articles library in the recognition resultTarget item.

Optionally, described to add the corresponding target information of the target item in the video flowing and/or audio stream pairIn the video frame answered, comprising: determine in the video flowing and/or the corresponding video frame of audio stream for adding the mesh of target informationCursor position；Add the target information in the target position in the video frame.

Optionally, it is used to add target information in the determination video flowing and/or the corresponding video frame of audio streamTarget position, comprising: selection is suitable for adding the target video frame of target information from the audio stream corresponding video frame；It determinesFor adding the target position of target information in the target video frame.

Optionally, the selection from the audio stream corresponding video frame is suitable for adding the target view of the target informationFrequency frame, comprising: obtain the information to match in the recognition result with the characteristic information of the target item as target identificationAs a result；Part corresponding with the target identification result is extracted in the audio stream as target audio；By the target audioCorresponding video frame is as the target video frame.

Optionally, believe in the determination video flowing and/or the corresponding video frame of audio stream for adding the targetThe target position of breath, comprising: determine the existing article and the object of the video flowing and/or the corresponding video frame of audio streamDegree of conformity between product；The position that degree of conformity meets the article of prerequisite is obtained from the existing article of the video frame, is madeFor target position；And/or it identifies and is suitable for adding the target in the video flowing and/or the corresponding video frame of audio streamThe prediction picture target area of information, using the prediction picture target area as the target position.

Optionally, the target position is subtitle relevant position；The target position in the video frame addsAdding the target information includes: to modify according to target information to the subtitle for including in the video frame, in the videoThe target information is added in the subtitle that frame includes；And/or using target information as the additional information of subtitle in the video frameAddition is around the subtitle, to add the target information in the video frame.

Optionally, described to add the corresponding target information of the target item in the video flowing and/or audio stream pairIn the video frame answered, comprising: according to the target information, to corresponding in the video flowing and/or the corresponding video frame of audio streamThe information of target position is modified, to obtain modified video frame；Or using the target information as the video flowingAnd/or the additional information of target position is corresponded in the corresponding video frame of audio stream.

Optionally, described device is also configured to execute one or one by one or more than one processorProcedure above includes the instruction for performing the following operation: according to the target information, modifies to the audio stream, withTo the modified audio stream to match with the target information.

Optionally, described to modify to the audio stream, comprising: to obtain the corresponding phonetic feature of the audio stream；BenefitWith the phonetic feature, speech synthesis is carried out to the target information, to obtain target audio；It is replaced using the target audioThe audio to match in the audio stream with the target item, replaced audio stream is as modified audio stream.

Optionally, described device is also configured to execute one or one by one or more than one processorProcedure above includes the instruction for performing the following operation: carrying out time shaft to the audio stream before modified audio stream and modificationAlignment.

Fig. 4 be it is shown according to an exemplary embodiment it is a kind of for video processing device 900 as terminal when frameFigure.For example, device 900 can be mobile phone, computer, digital broadcasting terminal, messaging device, game console put downPanel device, Medical Devices, body-building equipment, personal digital assistant etc..

Referring to Fig. 4, device 900 may include following one or more components: processing component 902, memory 904, power supplyComponent 906, multimedia component 908, audio component 910, the interface 912 of input/output (I/O), sensor module 914, andCommunication component 916.

The integrated operation of the usual control device 900 of processing component 902, such as with display, telephone call, data communication, phaseMachine operation and record operate associated operation.Processing element 902 may include that one or more processors 920 refer to executeIt enables, to perform all or part of the steps of the methods described above.In addition, processing component 902 may include one or more modules, justInteraction between processing component 902 and other assemblies.For example, processing component 902 may include multi-media module, it is more to facilitateInteraction between media component 908 and processing component 902.

Memory 904 is configured as storing various types of data to support the operation in equipment 900.These data are shownExample includes the instruction of any application or method for operating on device 900, contact data, and telephone book data disappearsBreath, picture, video etc..Memory 904 can be by any kind of volatibility or non-volatile memory device or their groupIt closes and realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM) is erasable to compileJourney read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flashDevice, disk or CD.

Power supply module 906 provides electric power for the various assemblies of device 900.Power supply module 906 may include power management systemSystem, one or more power supplys and other with for device 900 generate, manage, and distribute the associated component of electric power.

Multimedia component 908 includes the screen of one output interface of offer between described device 900 and user.OneIn a little embodiments, screen may include liquid crystal display (LCD) and touch panel (TP).If screen includes touch panel, screenCurtain may be implemented as touch screen, to receive input signal from the user.Touch panel includes one or more touch sensingsDevice is to sense the gesture on touch, slide, and touch panel.The touch sensor can not only sense touch or sliding motionThe boundary of movement, but also detect duration and pressure associated with the touch or slide operation.In some embodiments,Multimedia component 908 includes a front camera and/or rear camera.When equipment 900 is in operation mode, as shot mouldWhen formula or video mode, front camera and/or rear camera can receive external multi-medium data.Each preposition camera shootingHead and rear camera can be a fixed optical lens system or have focusing and optical zoom capabilities.

Audio component 910 is configured as output and/or input audio signal.For example, audio component 910 includes a MikeWind (MIC), when device 900 is in operation mode, when such as call mode, recording mode, and voice recognition mode, microphone is matchedIt is set to reception external audio signal.The received audio signal can be further stored in memory 904 or via communication setPart 916 is sent.In some embodiments, audio component 910 further includes a loudspeaker, is used for output audio signal.

I/O interface 912 provides interface between processing component 902 and peripheral interface module, and above-mentioned peripheral interface module canTo be keyboard, click wheel, button etc..These buttons may include, but are not limited to: home button, volume button, start button and lockDetermine button.

Sensor module 914 includes one or more sensors, and the state for providing various aspects for device 900 is commentedEstimate.For example, sensor module 914 can detecte the state that opens/closes of equipment 900, and the relative positioning of component, for example, it is describedComponent is the display and keypad of device 900, and sensor module 914 can be with 900 1 components of detection device 900 or devicePosition change, the existence or non-existence that user contacts with device 900,900 orientation of device or acceleration/deceleration and device 900Temperature change.Sensor module 914 may include proximity sensor, be configured to detect without any physical contactThe presence of neighbouring article.Sensor module 914 can also include optical sensor, such as CMOS or ccd image sensor, atAs being used in application.In some embodiments, which can also include acceleration transducer, gyro sensorsDevice, Magnetic Sensor, pressure sensor or temperature sensor.

Communication component 916 is configured to facilitate the communication of wired or wireless way between device 900 and other equipment.Device900 can access the wireless network based on communication standard, such as WiFi, 2G or 3G or their combination.In an exemplary implementationIn example, communication component 916 receives broadcast singal or broadcast related information from external broadcasting management system via broadcast channel.In one exemplary embodiment, the communication component 916 further includes near-field communication (NFC) module, to promote short range communication.ExampleSuch as, NFC module can be based on radio frequency identification (RFID) technology, Infrared Data Association (IrDA) technology, ultra wide band (UWB) technology,Bluetooth (BT) technology and other technologies are realized.

In the exemplary embodiment, device 900 can be believed by one or more application specific integrated circuit (ASIC), numberNumber processor (DSP), digital signal processing appts (DSPD), programmable logic device (PLD), field programmable gate array(FPGA), controller, microcontroller, microprocessor or other electronic components are realized, for executing the above method.

In the exemplary embodiment, a kind of non-transitorycomputer readable storage medium including instruction, example are additionally providedIt such as include the memory 904 of instruction, above-metioned instruction can be executed by the processor 920 of device 900 to complete the above method.For example,The non-transitorycomputer readable storage medium can be ROM, random access memory (RAM), CD-ROM, tape, floppy diskWith optical data storage devices etc..

Fig. 5 is the structural schematic diagram of server in some embodiments of the present invention.The server 1900 can be because of configuration or propertyEnergy is different and generates bigger difference, may include one or more central processing units (central processingUnits, CPU) 1922 (for example, one or more processors) and memory 1932, one or more storage applicationsThe storage medium 1930 (such as one or more mass memory units) of program 1942 or data 1944.Wherein, memory1932 and storage medium 1930 can be of short duration storage or persistent storage.The program for being stored in storage medium 1930 may include oneA or more than one module (diagram does not mark), each module may include to the series of instructions operation in server.More intoOne step, central processing unit 1922 can be set to communicate with storage medium 1930, execute storage medium on server 1900Series of instructions operation in 1930.

Server 1900 can also include one or more power supplys 1926, one or more wired or wireless netsNetwork interface 1950, one or more input/output interfaces 1958, one or more keyboards 1956, and/or, one orMore than one operating system 1941, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTMEtc..

A kind of non-transitorycomputer readable storage medium, when the instruction in the storage medium by device (terminal orServer) processor execute when, enable a device to execute a kind of method for processing video frequency, which comprises to video pairThe video flowing and/or audio stream answered are identified, to obtain corresponding recognition result；It is obtained and the knowledge from pre- placing articles libraryThe target item that other result matches；By the corresponding target information addition of the target item in the video flowing and/or audioIt flows in corresponding video frame.

Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention itsIts embodiment.The present invention is directed to cover any variations, uses, or adaptations of the invention, these modifications, purposes orPerson's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the disclosureOr conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are by followingClaim is pointed out.

It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, andAnd various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention andWithin principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Above to a kind of method for processing video frequency provided by the present invention, a kind of video process apparatus and a kind of at videoThe device of reason, is described in detail, and specific case used herein explains the principle of the present invention and embodimentIt states, the above description of the embodiment is only used to help understand the method for the present invention and its core ideas；Meanwhile for this fieldThose skilled in the art, according to the thought of the present invention, there will be changes in the specific implementation manner and application range, to sum up instituteIt states, the contents of this specification are not to be construed as limiting the invention.

Claims

1. a kind of method for processing video frequency characterized by comprising

By the corresponding target information addition of the target item in the video flowing and/or the corresponding video frame of audio stream.

2. the method according to claim 1, wherein described flow into the corresponding video flowing of video and/or audioRow identification, comprising:

Image recognition is carried out to the corresponding video flowing of video, to obtain corresponding image object information；And/or

Text identification is carried out to the corresponding video flowing of video, to obtain corresponding text information；And/or

Speech recognition is carried out to the corresponding audio stream of video, to obtain corresponding text information.

3. the method according to claim 1, wherein described obtain and the recognition result from pre- placing articles libraryThe target item to match, comprising:

The recognition result includes image object information, judge in described image target information whether include and the pre- placing articlesThe second identical, similar or generic article of first article in library, if so, using first article as with the identificationAs a result the target item to match；And/or

The recognition result includes text information, judge the text information whether include and the first object in the pre- placing articles libraryThe information that the corresponding characteristic information of the ware of product or the first article matches, if so, using first article asThe target item to match with the text information.

4. according to the method described in claim 3, it is characterized in that, it is described judge in described image target whether include with it is describedThe second identical, similar or generic article of first article in pre- placing articles library, comprising:

The feature of the first article is believed in the characteristic information for the second article for including by described image target and the pre- placing articles libraryBreath is matched, to obtain corresponding matching result；

If the matching result is successful match, it is determined that include in the recognition result and the first object in the pre- placing articles librarySame, the similar or generic target item of condition.

5. the method according to claim 1, wherein described add the corresponding target information of the target itemIn the video flowing and/or the corresponding video frame of audio stream, comprising:

It determines in the video flowing and/or the corresponding video frame of audio stream for adding the target position of target information；

Add the target information in the target position in the video frame.

6. according to the method described in claim 5, it is characterized in that, the determination video flowing and/or audio stream are correspondingFor adding the target position of target information in video frame, comprising:

Selection is suitable for adding the target video frame of target information from the audio stream corresponding video frame；

It determines in the target video frame for adding the target position of target information.

7. according to the method described in claim 6, it is characterized in that, described select to fit from the corresponding video frame of the audio streamIn the target video frame for adding the target information, comprising:

The information to match in the recognition result with the characteristic information of the target item is obtained as target identification result；

Part corresponding with the target identification result is extracted in the audio stream as target audio；

Using the corresponding video frame of the target audio as the target video frame.

8. according to the method described in claim 5, it is characterized in that, the determination video flowing and/or audio stream are correspondingFor adding the target position of the target information in video frame, comprising:

Determine meeting between the video flowing and/or the existing article and the target item of the corresponding video frame of audio streamDegree；The position that degree of conformity meets the article of prerequisite is obtained from the existing article of the video frame, as target position；

And/or

It identifies and is suitable for adding the prediction picture of the target information in the video flowing and/or the corresponding video frame of audio streamTarget area, using the prediction picture target area as the target position.

9. according to the method described in claim 5, it is characterized in that, the target position is subtitle relevant position；

Add the target information in the target position in the video frame

It modifies according to target information to the subtitle for including in the video frame, to add in the subtitle that the video frame includesAdd the target information；

And/or

It is added target information as the additional information of subtitle in the video frame around the subtitle, in the video frameThe middle addition target information.

10. the method according to claim 1, wherein described add the corresponding target information of the target itemIt is added in the video flowing and/or the corresponding video frame of audio stream, comprising:

According to the target information, to the information for corresponding to target position in the video flowing and/or the corresponding video frame of audio streamIt modifies, to obtain modified video frame；Or

Using the target information as the additional letter for corresponding to target position in the video flowing and/or the corresponding video frame of audio streamBreath.

11. according to claim 1 to any method in 10, which is characterized in that the method also includes:

It according to the target information, modifies to the audio stream, after obtaining the modification to match with the target informationAudio stream.

12. according to the method for claim 11, which is characterized in that described to modify to the audio stream, comprising:

Obtain the corresponding phonetic feature of the audio stream；

Using the phonetic feature, speech synthesis is carried out to the target information, to obtain target audio；

The audio to match in the audio stream with the target item, replaced audio stream are replaced using the target audioAs modified audio stream.

13. according to the method for claim 11, which is characterized in that the method also includes:

Time shaft is carried out with the audio stream before modification to modified audio stream to be aligned.

14. a kind of video process apparatus characterized by comprising

Identification module, for being identified to the corresponding video flowing of video and/or audio stream, to obtain corresponding recognition result；

Target item obtains module, for obtaining the target item to match with the recognition result from pre- placing articles library；WithAnd

Target information adding module, for adding the corresponding target information of the target item in the video flowing and/or soundFrequency flows in corresponding video frame.

15. a kind of device for video processing, which is characterized in that include memory and one or more than oneProgram, perhaps more than one program is stored in memory and is configured to by one or more than one processing for one of themIt includes the instruction for performing the following operation that device, which executes the one or more programs:

16. a kind of machine readable media is stored thereon with instruction, when executed by one or more processors, so that device is heldMethod for processing video frequency of the row as described in one or more in claim 1 to 13.