Movatterモバイル変換


[0]ホーム

URL:


CN101079996A - An interactive digital multimedia making method based on video and audio - Google Patents

An interactive digital multimedia making method based on video and audio
Download PDF

Info

Publication number
CN101079996A
CN101079996ACN 200610081465CN200610081465ACN101079996ACN 101079996 ACN101079996 ACN 101079996ACN 200610081465CN200610081465CN 200610081465CN 200610081465 ACN200610081465 ACN 200610081465ACN 101079996 ACN101079996 ACN 101079996A
Authority
CN
China
Prior art keywords
video
information
audio
image
audio frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 200610081465
Other languages
Chinese (zh)
Other versions
CN100596186C (en
Inventor
侯启槟
王阳生
曾祥永
鲁鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Interjoy Technology Limited
Original Assignee
BEIJING INTERJOY TECHNOLOGY Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING INTERJOY TECHNOLOGY LtdfiledCriticalBEIJING INTERJOY TECHNOLOGY Ltd
Priority to CN200610081465ApriorityCriticalpatent/CN100596186C/en
Publication of CN101079996ApublicationCriticalpatent/CN101079996A/en
Application grantedgrantedCritical
Publication of CN100596186CpublicationCriticalpatent/CN100596186C/en
Expired - Fee Relatedlegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Images

Landscapes

Abstract

This invention discloses a process method for interacted digital multimedia based on audio and video including: 1, getting site video images timely and pre-processing them to get initial video information, 2, processing the initial video information to video control information, 3, getting audio data in site timely and pre-processing them to get initial audio information, 4, converting the initial audio information to audio control information, in which, steps 1 and 2 form a step set 1, steps 3 and 4 to a set 2, and the sets 1 and 2 are independent to each other and enter into step 5 after finishing the steps, 5, combining the video and audio information, changing body content and outputting then finishing.

Description

A kind of interactive digital multimedia making method based on video and audio frequency
Technical field
The present invention relates to a kind of computer man-machine interacting technical method, particularly a kind of interactive digital multimedia making method based on video and audio frequency.
Background technology
In recent years, along with the innovation of information technology, the extensive utilization of multimedia technology, the develop rapidly of communication medium industry, the intention and the form of all kinds of media releasing (as advertisement) also emerge in an endless stream, and be rich and varied.But traditional media releasing intention and form are in a single day fixing, all have the drawback of consistency, one-way, repeatability.Though development along with computer vision and speech recognition technology, use vision and voice technology and carry out the man-machine possibility that become simply alternately of nature, but how to make audient and all kinds of media releasing carry out contactless interaction, make media releasing incorporate the motion and the acoustic information of scenery around audient self and the place as far as possible, and can make the media releasing content produce different variations by this different interaction, the interactivity and the interest of audient and issue when improving media releasing become a challenge that faces when making relevant multimedia file.
Summary of the invention
The technical problem to be solved in the present invention provides a kind of interactive digital multimedia making method based on video and audio frequency, makes multimedia file by man-machine interaction.
For solving the problems of the technologies described above, the present invention includes following steps: beginning;Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment;Step 2,step 1 is obtained preliminary video information process be converted into the video control information;Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus;Step 4,step 3 is obtained preliminary audio-frequency information handle and be converted into the audio frequency control information; Whereinstep 1 andstep 2 are formed the step group of carrying out in order,step 3 andstep 4 are formed the step group of carrying out in order two, step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein body refers to multimedia file; Finish.
The present invention is owing to adopt the interactive control of video and audio frequency, and the result is converted into the control command of multimedia file, and realization is to the direct control of virtual element in the multimedia file.
Description of drawings
Fig. 1 is that the inventive method is applied to the flow chart in the ad production;
Fig. 2 merges output to control information among Fig. 1, promptly the analysis of video and audio frequency and recognition result is mapped to flow chart in the correspondent advertisement control;
Fig. 3 is the inventive method one specific embodiment, is the design sketch of a interactive displaying project of developing an aquarium.
Embodiment
The present invention is further detailed explanation below in conjunction with drawings and the specific embodiments.
The inventive method can be divided on the principle based on the interactive digital multimedia making method of audio frequency with based on the interactive digital multimedia making method of video.
Wherein the interactive digital multimedia making method based on video comprises the steps:
1. by camera equipment, capture video images and carry out light and proofread and correct, remove preliminary treatment such as make an uproar in real time;
2. utilize the variation and the feature of video image on time and space that it is cut apart, image after will cutting apart carries out feature extraction, analysis, tries to achieve global motion information and local human body attitude information (position, direction, amplitude and the basic configuration parameter that they form) in the image; Handle by regularization,, be converted into the control command of advertisement these information;
3. control interface drives advertisement according to control command.
Interactive digital multimedia making method based on audio frequency comprises the steps:
1. will be from microphone and sound card equipment, gather voice data in real time and remove preliminary treatment such as make an uproar;
2. adopt tone analysis and speech recognition technology that the audio frequency of gathering is handled, obtain frequency values, range value and the corresponding meaning of one's words vocabulary recognition result of sound, be converted into the control command of advertisement;
3. control interface drives advertisement according to control command.
Must emphasize that above-mentioned two kinds of methods can independently use, also can be used in combination.
The embodiment that applies to ad production with this method further sets forth the present invention below.Fig. 1 is the flow chart of this embodiment, and wherein step (1-5) and step (6-10) can be used separately, and application also can walk abreast.
As shown in Figure 1, these embodiment concrete steps are as follows:
(1) obtains video image: obtain realtime graphic from the camera that is connected to computer by the high speed image trapping module.Owing to will handle each two field picture, so will from video flowing, image be extracted frame by frame.According to the difference of application purpose, the angle of camera can be people and the scenery in the place, also can take people and scenery from the top, place;
(2) remove to make an uproar and wait preliminary treatment: be to improve precision and the speed that subsequent motion information extraction and attitude information extract, need remove preliminary treatment such as make an uproar the two field picture of step (1) acquisition.At first, improve computational speed, the color image resolution of being gathered is reduced into original 1/4, and be converted to the gray level image of 256 grades in order to reduce operand.Secondly, ask space (in the frame) and time (interframe) to go up corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes.In addition, brightness is compensated to eliminate the influence of illumination variation.The pixel value that is about to each picture element deducts the average of entire image pixel value, divided by the variance of entire image pixel value, multiply by a coefficient more then.Through above-mentioned processing, thereby eliminate the influence that light changes to a certain extent;
(3) extraction of motion information:, need from the image after step (2) is handled, extract the movable information of the overall situation for the subsequent extracted attitude information.At first, present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then, the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image (0 expression less than, 1 expression more than or equal to) of moving region; At last, the bianry image to aforementioned acquisition carries out edge extracting, the edge of acquisition moving region.In addition, for certain fixed area, can try to achieve amplitude, direction and the speed parameter of this regional movement according to 1 shared ratio, position of centre of gravity and historical information in this zone;
(4) attitude information extracts: according to step (3) extraction of motion information result, further sport foreground is split, respectively zones of different is carried out feature separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment;
(5) the video Control Parameter is extracted and transformed: global motion information and local human body attitude information that step (3) and (4) are extracted change into corresponding control information;
(6) obtain voice data: gather realaudio data by microphone, sound card;
(7) remove preliminary treatment such as make an uproar:, remove by smoothing processing and to make an uproar for the audio frequency of real-time collection;
(8) tone information extracts: the audio frequency for removing after making an uproar, carry out tone analysis, and extract frequency values, the range value of sound;
(8) limited vocabulary speech recognition: adopt unspecified person, continuous speech recognition method, discern some discrete and less demanding limited vocabulary order of real-time,, stop etc. as beginning;
(9) the audio frequency Control Parameter is extracted and is transformed: the tone information and the limited vocabulary recognition result that are extracted are changed into corresponding control information;
(10) order realizes: the result that will discern at last, shine upon conversion by the command set that pre-defines, and obtain the control information of advertisement;
(11) multichannel merges: with the control information combination of video and audio frequency, form efficient comprehensively advertisement control command.
Describe above-mentioned steps 11 below in detail, soon the analysis of video and audio frequency and recognition result are mapped to the process in the correspondent advertisement control, and as shown in Figure 2, basic step is as follows:
(1) at first the ad content control command is classified: according to video have fast, intuitively, continuously output but be subject to characteristics of interference, and sound has the quick but not high characteristics of identifying instantaneity of nature required command set is effectively classified.
(2) based on the control of video: the corresponding relation that at first needs to set various movable informations and various human body attitude and advertisement controlled quentity controlled variable, then by scenery and audient colony around the camera collection place, attitude for motion in the image and human body, carry out real-time analysis and identification, according to current state, adopt certain predicting tracing algorithm, output control corresponding amount;
(3) based on the control of audio frequency, at first need to set up crucial dictionary, and the mapping table of keyword and related command, gather the voice signal of scenery around audient colony and the place then by microphone, according to tone analysis and voice identification result, produce control commands corresponding;
(4) by the advertisement control interface, with the order of video and audio frequency, real-time integration is in the virtual element and content control of advertisement, and perhaps direct adjustment model reaches the purpose of control.
In sum, the inventive method adopts the interactive control of video and audio frequency, exactly the motion and the sound of scenery around audient and the place are analyzed in computer and discerned, and the result is converted into control command to multimedia file, realize direct control virtual element in the multimedia.

Claims (4)

1, a kind of interactive mode based on video and audio frequency is made the digital multimedia document method, it is characterized in that, comprises the steps: beginning; Step 1, obtain the live video image in real time and carry out preliminary treatment, obtain preliminary video information by digit optical equipment; Step 2, step 1 is obtained preliminary video information process be converted into the video control information; Step 3, obtain the live audio data in real time, and carry out preliminary treatment, obtain preliminary audio-frequency information by digital audio-frequency apparatus; Step 4, step 3 is obtained preliminary audio-frequency information handle and be converted into the audio frequency control information; Wherein step 1 and step 2 are formed the step group of carrying out in order, step 3 and step 4 are formed the step group of carrying out in order two, described step group one, step group two are independently of one another, can carry out simultaneously, can not carry out simultaneously yet, and no matter whether step group one, step group two are carried out simultaneously, all enter step 5 after executing; Described video control information and audio frequency control information are merged in step 5, processing, and export the control command to body, drive body by control interface by described control command, and change body content is also exported, and wherein said body refers to multimedia file; Finish.
3, the interactive mode based on video and audio frequency according to claim 2 is made the digital multimedia document method, it is characterized in that, preliminary treatment described in the step 1 comprises that described live video image is carried out light to be proofreaied and correct, remove and make an uproar; Described in the step 2 the preliminary video information process of obtain being converted into the video control information comprises: video image is cut apart by its variation and feature on time and space, again the image after cutting apart is extracted and analytical characteristic, extract global motion information and local human body attitude information, wherein said local human body attitude information comprises the basic configuration parameter that position of human body, direction, amplitude and human body are formed; Preliminary treatment described in the step 3 comprises adopts tone analysis and speech recognition technology to handle the live audio data; Described in the step 4 the preliminary audio-frequency information processing of obtain being converted into the audio frequency control information comprises: extract frequency values, the range value of sound, carry out the limited vocabulary speech recognition; Processing described video control information of fusion and audio frequency control information relate in the described step 5: the command set pretreatment module, video control transformation module and audio frequency control transformation module, wherein the command set pretreatment module is classified to the video/audio command set, and press described video/audio control information and the audio frequency control information accepted and give video control transformation module and audio frequency control transformation module with corresponding command mapping respectively, video control transformation module accept the order of described video control information and aforementioned command set pretreatment module mapping and output to the video control command of body to control interface, audio frequency control transformation module accept the order of described audio frequency control information and the mapping of aforementioned command set pretreatment module and output to the audio frequency control command of body to control interface.
4, the interactive mode based on video and audio frequency according to claim 3 is made the digital multimedia document method, it is characterized in that, described live video image removes makes an uproar, and comprising: at first the live video image resolution ratio is reduced into originally 1/4, and is converted to the gray level image of 256 grades; Ask in the frame then and interframe on corresponding pixel mean value, every two field picture is carried out smoothing processing, remove the random noise that gatherer process causes; Described live video image carries out light and proofreaies and correct and to refer to: the pixel value of each picture element is deducted the average of entire image pixel value, divided by the variance of entire image pixel value, and then multiply by a coefficient; Described extraction global motion information comprises: at first present frame is done additive operation with each corresponding pixel of former frame image, and ask the absolute value that subtracts each other the result, obtain the frame-to-frame differences image of descriptor frame differences information; Then the frame-to-frame differences image is carried out threshold process, judge each pixel be more than or equal to or less than certain fixing threshold value, obtain describing the bianry image of moving region, with 0 represent less than, 1 represent more than or equal to; Bianry image to aforementioned acquisition carries out edge extracting at last, obtains the edge of moving region; The local human body attitude information of described extraction refers to: according to aforementioned extraction global motion information result, further cut apart sport foreground, zones of different is carried out signature analysis separately, analyze the change procedure of the shape and the shape of the edge contour in the specific region in the aforementioned bianry image, extract feature with rotation convergent-divergent consistency, draw corresponding attitude information, and followed the tracks of verification and prediction by the result in a last moment.
CN200610081465A2006-05-222006-05-22 A method for producing interactive digital multimedia based on video and audioExpired - Fee RelatedCN100596186C (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN200610081465ACN100596186C (en)2006-05-222006-05-22 A method for producing interactive digital multimedia based on video and audio

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN200610081465ACN100596186C (en)2006-05-222006-05-22 A method for producing interactive digital multimedia based on video and audio

Publications (2)

Publication NumberPublication Date
CN101079996Atrue CN101079996A (en)2007-11-28
CN100596186C CN100596186C (en)2010-03-24

Family

ID=38907185

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN200610081465AExpired - Fee RelatedCN100596186C (en)2006-05-222006-05-22 A method for producing interactive digital multimedia based on video and audio

Country Status (1)

CountryLink
CN (1)CN100596186C (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102436626A (en)*2010-11-192012-05-02微软公司Computing cost per interaction for interactive advertising sessions
CN103186227A (en)*2011-12-282013-07-03北京德信互动网络技术有限公司Man-machine interaction system and method
CN103186226A (en)*2011-12-282013-07-03北京德信互动网络技术有限公司Man-machine interaction system and method
CN103905926A (en)*2014-04-142014-07-02夷希数码科技(上海)有限公司Method and device for playing outdoor advertisement
CN104571516A (en)*2014-12-312015-04-29武汉百景互动科技有限责任公司Interactive advertising system
WO2016026446A1 (en)*2014-08-192016-02-25北京奇虎科技有限公司Implementation method for intelligent image pick-up system, intelligent image pick-up system and network camera
CN107197327A (en)*2017-06-262017-09-22广州天翌云信息科技有限公司 A method of producing digital media
CN109308625A (en)*2017-07-272019-02-05掌游天下(北京)信息技术股份有限公司A kind of production method for playing advertisement, system and corresponding storage medium
CN110349576A (en)*2019-05-162019-10-18国网上海市电力公司Power system operation instruction executing method, apparatus and system based on speech recognition
CN112348926A (en)*2020-11-232021-02-09杭州美册科技有限公司Android-based video splicing app processing method and device

Cited By (12)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN102436626A (en)*2010-11-192012-05-02微软公司Computing cost per interaction for interactive advertising sessions
CN103186227A (en)*2011-12-282013-07-03北京德信互动网络技术有限公司Man-machine interaction system and method
CN103186226A (en)*2011-12-282013-07-03北京德信互动网络技术有限公司Man-machine interaction system and method
CN103905926A (en)*2014-04-142014-07-02夷希数码科技(上海)有限公司Method and device for playing outdoor advertisement
WO2016026446A1 (en)*2014-08-192016-02-25北京奇虎科技有限公司Implementation method for intelligent image pick-up system, intelligent image pick-up system and network camera
CN104571516A (en)*2014-12-312015-04-29武汉百景互动科技有限责任公司Interactive advertising system
CN104571516B (en)*2014-12-312018-01-05武汉百景互动科技有限责任公司Interactive advertisement system
CN107197327A (en)*2017-06-262017-09-22广州天翌云信息科技有限公司 A method of producing digital media
CN107197327B (en)*2017-06-262020-11-13广州天翌云信息科技有限公司 A method of digital media production
CN109308625A (en)*2017-07-272019-02-05掌游天下(北京)信息技术股份有限公司A kind of production method for playing advertisement, system and corresponding storage medium
CN110349576A (en)*2019-05-162019-10-18国网上海市电力公司Power system operation instruction executing method, apparatus and system based on speech recognition
CN112348926A (en)*2020-11-232021-02-09杭州美册科技有限公司Android-based video splicing app processing method and device

Also Published As

Publication numberPublication date
CN100596186C (en)2010-03-24

Similar Documents

PublicationPublication DateTitle
CN101079996A (en)An interactive digital multimedia making method based on video and audio
CN111091824B (en) A kind of voice matching method and related equipment
CN100345085C (en)Method for controlling electronic game scene and role based on poses and voices of player
CN113516990B (en)Voice enhancement method, neural network training method and related equipment
CN108073875A (en)A kind of band noisy speech identifying system and method based on monocular cam
CN116934926B (en)Recognition method and system based on multi-mode data fusion
CN114581812A (en) Visual language recognition method, device, electronic device and storage medium
CN112001308A (en)Lightweight behavior identification method adopting video compression technology and skeleton features
Ivanko et al.RUSAVIC Corpus: Russian audio-visual speech in cars
CN118262114A (en) A multi-modal real-time interactive decision-making method and system
CN116580720A (en)Speaker vision activation interpretation method and system based on audio-visual voice separation
CN117177005A (en)Method for generating video of flower batting based on multi-mode and dynamic visual angle adjustment
CN119380742A (en) A multimodal speech enhancement system based on audio and video
CN110379130B (en)Medical nursing anti-falling system based on multi-path high-definition SDI video
CN112669207A (en)Method for enhancing resolution of face image based on television camera
CN1731833A (en) A method for synthesizing audio-visual files with voice-driven head images
CN119323664A (en)Attention mechanism guided electric power field ground oil water stain identification method
CN118781390A (en) An intelligent identification device for Chinese herbal medicine
Wei et al.Three-dimensional joint geometric-physiologic feature for lip-reading
CN116916089B (en)Intelligent video editing method integrating voice features and face features
ZhaoA facial expression recognition method using two-stream convolutional networks in natural scenes
Kwon et al.Real time character and speech commands recognition system
CN108109614A (en)A kind of new robot band noisy speech identification device and method
Wang et al.Are you speaking: Real-time speech activity detection via landmark pooling network
CN120529067B (en) EEG signal-assisted video experience quality evaluation system

Legal Events

DateCodeTitleDescription
C06Publication
PB01Publication
C10Entry into substantive examination
SE01Entry into force of request for substantive examination
C14Grant of patent or utility model
GR01Patent grant
C56Change in the name or address of the patentee

Owner name:BEIJING SHENGKAI INTERACTIVE TECHNOLOGY CO., LTD.

Free format text:FORMER NAME: BEIJING SHENGKAI INTERACTIVE ENTERTAINMENT TECHNOLOGY CO., LTD.

CP01Change in the name or title of a patent holder

Address after:100080, Beijing, Zhichun Road, Haidian District, No. 63 satellite building, 9 floor

Patentee after:Beijing Interjoy Technology Limited

Address before:100080, Beijing, Zhichun Road, Haidian District, No. 63 satellite building, 9 floor

Patentee before:Beijing Interjoy Technology Limited

CF01Termination of patent right due to non-payment of annual fee

Granted publication date:20100324

Termination date:20180522

CF01Termination of patent right due to non-payment of annual fee

[8]ページ先頭

©2009-2025 Movatter.jp