A kind of image display method, device and the video conferencing system of panoramic video meetingTechnical field
The present invention relates to technical field of video communication more particularly to a kind of image display method of panoramic video meeting,Image display device, video conferencing system and the computer readable storage medium of panoramic video meeting.
Background technique
With the development of video camera and network technology, make it possible remote real-time video meeting, existing video conferenceSystem supports the synchronous transfer function of audio and video, can be sent to the audio-video signal at meeting scene far from meeting sceneOn audio/video player, so that long-range participant be helped to participate in or audit meeting.Previous video conference mostly uses single width figureThe tiling display mode of picture, the video of this mode do not have Deep Canvas, and are difficult to obtain the short distance front elevation of spokesmanPicture, so that the real-time interactive effect of video conference is had a greatly reduced quality.It is shown to make video conference scene be able to carry out panorama, oftenMultiple cameras is needed to carry out the shooting of different angle, meanwhile, captured image is switched on a video display aobviousShow, or the specific picture of a certain meeting participant progress is focused using one or more video camera and is shown.However, being taken the photograph using moreCamera switches or when being particularly shown, although having achieved the effect that panorama video signal, this mode needs multiple cameras and speciallyThe operator of door completes, and considerably increases the cost and complexity of video conference.
Summary of the invention
Therefore, the technical problem to be solved in the present invention is that solving the video conference in the prior art for realizing panorama video signalThe problem of structure is complicated for system, higher cost provides a kind of image that being able to carry out based on panoramic camera is particularly shownDisplay methods, device and video conferencing system.
For this purpose, according in a first aspect, the present invention provides a kind of image generating method of panoramic video meeting, including it is as followsStep: the panoramic video data at meeting scene are obtained using panoramic camera;Video frame each in panoramic video data is carried outHumanoid detection obtains the humanoid coordinate information of each video frame in panoramic video data;It is obtained according to humanoid coordinate information eachThe action message of a certain mobile human body in video frame;Each video frame in panoramic video data is carried out according to action messageImage interception generates the video image of mobile human body.
Optionally, the image generating method of panoramic video meeting further includes following steps: using microphone array to meetingThe sound source at scene is positioned, and sound source position information is obtained;Sound source is obtained in panoramic video data according to sound source position informationSound source coordinate information;Using sound source coordinate information as speaker's location information, or the sound source seat that will include the corresponding momentThe humanoid coordinate information of information is marked as speaker's location information;According to speaker's location information to each in panoramic video dataA video frame carries out image interception, generates the video image of speaker.
Optionally, the image generating method of panoramic video meeting further includes following steps: to each in panoramic video dataVideo frame carries out Face datection, obtains the face coordinate information of each video frame in panoramic video data;When to include to correspondenceThe face coordinate information of the sound source coordinate information at quarter is as speaker's location information.
According to second aspect, the present invention also provides a kind of image generating methods of panoramic video meeting, including walk as followsIt is rapid: to receive the panoramic video data at meeting scene;Receive the action message of a certain mobile human body in panoramic video data;Movement letterBreath includes humanoid coordinate information of the mobile human body in the different video frame of panoramic video data;According to action message to aphoramaEach video frame of the frequency in carries out image interception, generates the video image of mobile human body.
Optionally, the image generating method of panoramic video meeting further includes following steps: receiving speaker's location information;It is mainSay people's location information be panoramic video data in sound source coordinate information or include the corresponding moment sound source coordinate informationHumanoid coordinate information or include the corresponding moment sound source coordinate information face coordinate information;According to speaker's location informationImage interception is carried out to each video frame in panoramic video data, generates the video image of speaker.
Optionally, the image generating method of panoramic video meeting further includes following steps: by panoramic video data and entirelyTimestamp alignment in scape video data in the action message of a certain mobile human body.
According to the third aspect, the present invention also provides a kind of video generation devices of panoramic video meeting, comprising: video letterBreath obtains module, for using panoramic camera to obtain the panoramic video data at meeting scene;Humanoid detection module, for completeEach video frame carries out humanoid detection in scape video data, obtains the humanoid coordinate letter of each video frame in panoramic video dataBreath;Action message obtains module, for obtaining the movement of a certain mobile human body in each video frame according to humanoid coordinate informationInformation;First image generation module is cut for carrying out image to each video frame in panoramic video data according to action messageIt takes, generates the video image of mobile human body.
Optionally, the video generation device of panoramic video meeting further include: auditory localization module, for using microphone arrayColumn position the sound source at meeting scene, obtain sound source position information;The first locating module of speaker, for according to sound source positionConfidence ceases to obtain sound source coordinate information of the sound source in panoramic video data;Believe sound source coordinate information as speaker positionBreath, or using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker's location information;Second figureIt is raw for carrying out image interception to each video frame in panoramic video data according to speaker's location information as generation moduleAt the video image of speaker.
Optionally, the video generation device of panoramic video meeting further include: the second locating module of speaker, for panoramaEach video frame carries out Face datection in video data, obtains the face coordinate information of each video frame in panoramic video data;Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.
According to fourth aspect, the present invention also provides a kind of video generation devices of panoramic video meeting, comprising: video letterReceiving module is ceased, for receiving the panoramic video data at meeting scene;Action message receiving module, for receiving aphorama frequencyThe action message of a certain mobile human body in;Action message includes mobile human body in the different video frame of panoramic video dataHumanoid coordinate information;Third image generation module, for according to action message to each video frame in panoramic video data intoRow image interception generates the video image of mobile human body.
Optionally, the video generation device of panoramic video meeting further include: speaker's information receiving module, for receiving masterSay people's location information;Speaker's location information is the sound source coordinate information in panoramic video data or includes the corresponding momentThe humanoid coordinate information of sound source coordinate information or include the corresponding moment sound source coordinate information face coordinate information;4thImage generation module, for carrying out image interception to each video frame in panoramic video data according to speaker's location information,Generate the video image of speaker.
According to the 5th aspect, the present invention provides a kind of video conferencing systems, comprising: at least one processor;And withThe memory of at least one processor communication connection;Wherein, memory is stored with the instruction that can be executed by a processor, instructionIt is executed by least one processor, so that at least one processor executes all or part of method of above-mentioned first aspect, orExecute all or part of method of above-mentioned second aspect.
According to the 6th aspect, the present invention provides a kind of computer readable storage mediums, are stored thereon with computer instruction,The step of all or part of method of above-mentioned first aspect, is realized in the instruction when being executed by processor, or realizes above-mentioned second partyThe step of all or part of method in face.
Technical solution provided in an embodiment of the present invention, has the advantages that
1, the image generating method of panoramic video meeting provided by the invention, is included the following steps: for being taken the photograph using panoramaThe panoramic video data at camera acquisition meeting scene;Humanoid detection is carried out to video frame each in panoramic video data, is obtained completeThe humanoid coordinate information of each video frame obtains a certain shifting in each video frame according to humanoid coordinate information in scape video dataThe action message of moving body;Image interception is carried out to each video frame in panoramic video data according to action message, generates and movesThe video image of moving body.The panoramic video data at meeting scene, solution are obtained by using panoramic cameras such as fish eye camerasIt has determined and has carried out the shooting of different angle using multiple cameras in the prior art, to obtain the video conference of panoramic video dataThe complex problem of the structure of system;Meanwhile the humanoid of each video frame in panoramic video data is obtained by humanoid detectionCoordinate information, and obtain whether the human body is shifting by the situation of change of the humanoid coordinate information of a certain human body in each video frameMoving body, and using humanoid coordinate information of the mobile human body in meeting scene in each video frame as its action message, thenWhen according to the image of each intercepting video frames mobile human body of the action message in panoramic video data, it can focus and generate meetingThe video image of mobile human body in view scene, to realize being particularly shown in panorama video signal, cost of implementation is lower.
2, the image generating method of panoramic video meeting provided by the invention, further includes following steps: using microphone arrayColumn position the sound source at meeting scene, obtain sound source position information;Sound source is obtained in aphorama according to sound source position informationSound source coordinate information of the frequency in;It using sound source coordinate information as speaker's location information, or will include the corresponding momentSound source coordinate information humanoid coordinate information as speaker's location information;According to speaker's location information to aphorama frequencyEach video frame in carries out image interception, generates the video image of speaker.It is existing to meeting by using microphone arrayThe sound source of field is positioned to obtain sound source position information, and the sound source position in meeting scene is generally speaker position, thenIntercept the video image in each video frame of panoramic video data at sound source position, it will be able to generate the speaker's at meeting sceneVideo image, thus, it is possible to realize that the speaker in panorama video signal is particularly shown, the image for enriching the panoramic video meeting is generatedMethod is particularly shown function.In addition, using include the corresponding moment sound source coordinate information humanoid coordinate information as speakerPeople's location information is realized the dual status of speaker position by sound source coordinate information and humanoid coordinate information, can preventedWhen only positioning speaker position by sound source coordinate information, the problem of non-human sound source at meeting scene can also be treated as speakerGeneration, improve the positioning accuracy of speaker.
3, the image generating method of panoramic video meeting provided by the invention, to each video frame in panoramic video data intoRow Face datection obtains the face coordinate information of each video frame in panoramic video data;It will include the sound source at corresponding momentThe face coordinate information of coordinate information is as speaker's location information.By carrying out people to video frame each in panoramic video dataFace detection, and using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information, that is, lead toIt crosses sound source coordinate information and face coordinate information realizes the dual status of speaker position, can prevent from only believing by sound source coordinateWhen breath positioning speaker position, the generation for the problem of non-human sound source at meeting scene can also be treated as speaker, additionally it is possible to preventOnly using include the corresponding moment sound source coordinate information humanoid coordinate information as speaker's location information, and human body is by obstacleWhen object blocks, the generation for the problem of can not accurately determining speaker position further improves the positioning accuracy of speaker.
Detailed description of the invention
It, below will be to specific in order to illustrate more clearly of the specific embodiment of the invention or technical solution in the prior artEmbodiment or attached drawing needed to be used in the description of the prior art be briefly described, it should be apparent that, it is described belowAttached drawing is some embodiments of the present invention, for those of ordinary skill in the art, before not making the creative laborIt puts, is also possible to obtain other drawings based on these drawings.
Fig. 1 is a kind of method flow diagram of the image generating method for panoramic video meeting that embodiment 1 provides;
Fig. 2 and Fig. 3 is the specific example figure of video image output mode;
Fig. 4 is a kind of method flow diagram of the image generating method for panoramic video meeting that embodiment 2 provides;
Fig. 5 is a kind of structural schematic diagram of the video generation device for panoramic video meeting that embodiment 3 provides;
Fig. 6 is a kind of structural schematic diagram of the video generation device for panoramic video meeting that embodiment 4 provides;
Fig. 7 is a kind of hardware structural diagram for video conferencing system that embodiment 5 provides.
Specific embodiment
Technical solution of the present invention is clearly and completely described below in conjunction with attached drawing, it is clear that described implementationExample is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skillPersonnel's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
In the description of the present invention, it should be noted that term " first ", " second ", " third " are used for description purposes only,It is not understood to indicate or imply relative importance.
Embodiment 1
A kind of image generating method of panoramic video meeting is present embodiments provided, as shown in Figure 1.It should be noted thatStep shown in the flowchart of the accompanying drawings can execute in a computer system such as a set of computer executable instructions, andIt, in some cases, can be to be different from sequence execution institute herein and although logical order is shown in flow chartsThe step of showing or describing.The process includes the following steps:
Step S100 obtains the panoramic video data at meeting scene using panoramic camera.In this embodiment, using flakeVideo camera obtains the panoramic video data at meeting scene, also, the panoramic video data are the video data after correction process.
Step S200 carries out humanoid detection to video frame each in panoramic video data, obtains each in panoramic video dataThe humanoid coordinate information of a video frame.In the present embodiment, it is humanoid that the progress of OpenCV AdaBoost scheduling algorithm can be usedDetection, obtains the humanoid coordinate information of each video frame in panoramic video data, in a particular embodiment, humanoid coordinate information isOne coordinate information comprising humanoid rectangle frame, specifically, humanoid coordinate information are the rectangle frame comprising human body in aphoramaCoordinate in frequency data coordinate system.
Step S300 obtains the action message of a certain mobile human body in each video frame according to humanoid coordinate information.?In the present embodiment, different human bodies is distinguished by the humanoid head and shoulder ratio detected or humanoid shape.In the present embodimentIn, whether change by comparing humanoid coordinate information of the same human body in different video frame and judges whether the human body is mobile humanBody, specifically, when humanoid coordinate information of the same human body in different video frame constantly changes, the human body is mobile human body,Humanoid coordinate information of the mobile human body in each video frame is the action message of the mobile human body;When same human body is notConstant or when being basically unchanged with the humanoid coordinate information in video frame, the human body is not mobile human body.
Step S400 carries out image interception to each video frame in panoramic video data according to action message, generates and moveThe video image of moving body.In the present embodiment, the people by the mobile human body that includes in action message in each video frameShape coordinate information intercepts the image at the corresponding humanoid coordinate information of video frame, to generate the video image of mobile human body.
In the present embodiment, the user of terminal can according to their own needs carry out the video pictures locally exported autonomousSelection.Specifically, user can choose in a manner of picture-in-picture (one big picture adds a small picture) to show video figurePicture, as shown in Fig. 2, the panorama at display conference scene may be selected in picture-in-picture big picture therein, and small picture then may be selectedThe focused visual of mobile human body, i.e., newly enter personnel's mobile tracking picture of meeting room in meeting scene;Certainly, as shown in figure 3,User also can choose the focused visual of mobile human body in big picture display conference scene, small picture then display conference scene it is completeScape picture.
It should be noted that the above method is the data collection station in video conferencing system while being used as image output eventuallyData processing method process when end, and when data collection station is exported without image, then directly executing step S100After~step S300, video playing terminal is sent by the action message of panoramic video data and a certain mobile human body, video is broadcastIt puts terminal and image interception is carried out to each video frame in panoramic video data according to action message, generate the video of mobile human bodyImage.In the present embodiment, it before carrying out data transmission, needs to carry out Video coding to panoramic video data, specifically, depending onThe coded format of frequency coding can be various coded formats, for example, H.263, H.264, H.265, MPEG-4, VP8 etc., and to dynamicMake information and also carry out necessary coding, in the present embodiment, carries out the panoramic video data of Video coding and carry out humanoid detectionPanoramic video data can be same part data, i.e. video detection and Video coding can be with serial process, or two partsDifferent data, i.e. video detection and Video coding can also be with parallel processings, specifically same part data or two parts differentDepending on the ability of the chip of data visualization data collection station, if the chip operational capability of data collection station is enough, humanoid inspectionIt is minimum to survey work bring time delay, same part data may be used carrys out serial process and be detected and encoded;If data collection stationChip can brings in humanoid detection compared with long time delay, then is detected and encoded respectively using two parts of data.
It in the present embodiment, can be on the same channel by the panoramic video data and action message packing hair after codingIt send to video playing end, the panoramic video data after coding can also be sent to video playing end on one channel, anotherThe action message after coding is sent to video playing end on one channel;Panoramic video after on the same channel by codingWhen data and action message are sent to video playing end, the combinations of panoramic video data and action message after needing to encode withIt is aligned on the basis of timestamp (time-stamp).In the present embodiment, the packing transmission of panoramic video data uses standardThe packing transport protocol of video conference, to maximize compatible various traditional video conference terminals and entity, so that data acquireTerminal both can execute the method in the present embodiment with the video playing end for supporting specific picture to show and carry out remote real-time videoMeeting can also carry out common to realize that specific picture is shown with traditional video playing end for not supporting specific picture to showRemote real-time video meeting.
The image generating method of panoramic video meeting provided in this embodiment, by using panoramic shootings such as fish eye camerasMachine obtains the panoramic video data at meeting scene, solves the bat for carrying out different angle using multiple cameras in the prior artIt takes the photograph, thus the problem that the structure for obtaining the video conferencing system of panoramic video data is complex;Meanwhile it being detected by humanoidThe humanoid coordinate information of each video frame into panoramic video data, and pass through the humanoid coordinate of a certain human body in each video frameThe situation of change of information obtains whether the human body is mobile human body, and by the mobile human body in meeting scene in each video frameHumanoid coordinate information as its action message, then each intercepting video frames according to action message in panoramic video data shouldWhen the image of mobile human body, the video image for generating mobile human body in meeting scene can be focused, to realize in panorama video signalBe particularly shown, cost of implementation is lower.
In an alternate embodiment of the invention, the image generating method of panoramic video meeting further includes following steps:
Step S500 is positioned using sound source of the microphone array to meeting scene, obtains sound source position information.At thisIn embodiment, the time difference information of current speech and the position of multiple microphones are obtained by microphones multiple in microphone arrayConfidence ceases to obtain sound source position information.
Step S600 obtains sound source coordinate information of the sound source in panoramic video data according to sound source position information.At thisIn embodiment, using sound source coordinate information as speaker's location information, or the sound source coordinate information that will include the corresponding momentHumanoid coordinate information as speaker's location information.In the present embodiment, pass through sound source position information and fish eye cameraLocation information obtains sound source angle value of the sound source position relative to fish eye camera position, to obtain sound source in aphorama frequencyAccording to the sound source coordinate information in coordinate system, and humanoid coordinate information obtained in sound source coordinate information and step S200 be based onThe coordinate information of same coordinate system.
In the present embodiment, since auditory localization result is a location point, sound source coordinate information is panoramic video dataA coordinate points in coordinate system at the location point, it includes humanoid rectangle frame, humanoid coordinate information that humanoid testing result, which is one,For the coordinate set of all coordinate points composition in panoramic video data coordinate system within the scope of the rectangle frame, and it is when a certain human bodyWhen speaker, sound source position should be overlapped with position of human body, and obtained sound source coordinate also should be a coordinate in humanoid coordinate set,It therefore, can be with when the humanoid coordinate information when a certain human body in a video frame includes the sound source coordinate information at corresponding momentThe identification human body is speaker, and the humanoid coordinate information of the human body is speaker's location information.
Step S700 carries out image interception to each video frame in panoramic video data according to speaker's location information,Generate the video image of speaker.
In the present embodiment, the user of terminal can according to their own needs carry out the video pictures locally exported autonomousSelection.Specifically, user can choose in a manner of picture-in-picture (one big picture adds a small picture) to show video figurePicture, and the focused visual to panorama, mobile human body and the focused visual of speaker can carry out respectively according to their own needsKind combination can also switch over as needed at any time in agenda.
It should be noted that the above method is also the data collection station in video conferencing system while image being used as to exportData processing method process when terminal, and when data collection station is exported without image, then directly executing stepAfter S500 and step S600, video playing terminal is sent by panoramic video data and speaker's location information, video playing is wholeEnd carries out image interception to each video frame in panoramic video data according to speaker's location information, generates the video of speakerImage.Before carrying out data transmission, it is also desirable to panoramic video data carry out Video coding, to speaker's location information also intoThe necessary coding of row, specific coding mode and sending method are identical as above-mentioned action message coding mode and sending method,Details are not described herein.
The image generating method of panoramic video meeting provided in this embodiment, by using microphone array to meeting sceneSound source positioned to obtain sound source position information, and the sound source position in meeting scene is generally speaker position, then is cutTake the video image at the sound source position in each video frame of panoramic video data, it will be able to generate the speaker's at meeting sceneVideo image, thus, it is possible to realize that the speaker in panorama video signal is particularly shown, the image for enriching the panoramic video meeting is generatedMethod is particularly shown function.In addition, using include the corresponding moment sound source coordinate information humanoid coordinate information as speakerPeople's location information is realized the dual status of speaker position by sound source coordinate information and humanoid coordinate information, can preventedWhen only positioning speaker position by sound source coordinate information, the problem of non-human sound source at meeting scene can also be treated as speakerGeneration, improve the positioning accuracy of speaker.
In an alternate embodiment of the invention, the image generating method of panoramic video meeting further includes following steps:
Step S800 carries out Face datection to video frame each in panoramic video data, obtains each in panoramic video dataThe face coordinate information of a video frame.In the present embodiment, face coordinate information is the coordinate of a rectangle frame comprising faceInformation, specifically, face coordinate information are coordinate of the rectangle frame comprising face in panoramic video data coordinate system, and peopleFace coordinate information is base with humanoid coordinate information obtained in sound source coordinate information obtained in step S600 and step S200In the coordinate information of same coordinate system.
In the present embodiment, since auditory localization result is a location point, sound source coordinate information is panoramic video dataA coordinate points in coordinate system at the location point, Face datection result are a rectangle frame comprising face, face coordinate informationFor the coordinate set of all coordinate points composition in panoramic video data coordinate system within the scope of the rectangle frame, and it is when a certain human bodyWhen speaker, sound source position should be overlapped with the face location of the human body, and obtained sound source coordinate also should be in face coordinate setOne coordinate, therefore, when the sound source coordinate that face coordinate information of a certain human body in a video frame includes the corresponding moment is believedWhen breath, it can be assumed that the human body is speaker, the face coordinate information of the human body is speaker's location information.
In the present embodiment, AdaBoost, Viola Jones can be used or CNN scheduling algorithm carries out Face datection.
The image generating method of panoramic video meeting provided in this embodiment, by each video in panoramic video dataFrame carry out Face datection, and using include the corresponding moment sound source coordinate information face coordinate information as speaker position letterBreath is realized the dual status of speaker position by sound source coordinate information and face coordinate information, can prevent from only passing through soundWhen source coordinate information positions speaker position, the generation for the problem of non-human sound source at meeting scene can also be treated as speaker,Can also prevent will include the sound source coordinate information at corresponding moment humanoid coordinate information as speaker's location information, and peopleWhen body is blocked by barrier, the generation for the problem of can not accurately determining speaker position further improves the positioning of speakerPrecision.
Embodiment 2
A kind of image generating method of panoramic video meeting is present embodiments provided, as shown in Figure 4.It should be noted thatThis method is embodiment of the method in embodiment 1 at video playing end, already explained to repeat no more.In addition, in attached drawingThe step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also,Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouchThe step of stating.The process includes the following steps:
Step S10 receives the panoramic video data at meeting scene.In the present embodiment, the panoramic video data receivedIt needs to be decoded the panoramic video data after coding before generating video image for the panoramic video data after coding.
Step S20 receives the action message of a certain mobile human body in panoramic video data.In the present embodiment, movement letterBreath includes humanoid coordinate information of the mobile human body in the different video frame of panoramic video data.In the present embodiment, it receivesAction message also be coding after action message also need before generating video image to the action message after codingIt is decoded.
Step S30 carries out image interception to each video frame in panoramic video data according to action message, generates movementThe video image of human body.
In the present embodiment, user can also carry out autonomous choosing to the video pictures locally exported according to their own needsIt selects, identical in the specific optional way of output such as embodiment 1, details are not described herein.
In an alternate embodiment of the invention, the image generating method of panoramic video meeting further includes following steps:
Step S40 receives speaker's location information.In the present embodiment, speaker's location information is panoramic video dataIn sound source coordinate information or include the corresponding moment sound source coordinate information humanoid coordinate information or include correspondenceThe face coordinate information of the sound source coordinate information at moment.
Step S50 carries out image to each video frame in the panoramic video data according to speaker's location information and cutsIt takes, generates the video image of speaker.
In an alternate embodiment of the invention, the image generating method of panoramic video meeting further includes following steps:
Step S60, by panoramic video data and panoramic video data in the action message of a certain mobile human body whenBetween stab alignment.In the present embodiment, when receiving panoramic video data and action message from two channels, video image is being generatedBefore, it needs the timestamp pair in panoramic video data and panoramic video data in the action message of a certain mobile human bodyTogether.Similarly, when receiving panoramic video data and speaker's location information from two channels, before generating video image,It is also required to the timestamp alignment in panoramic video data and speaker's location information.
Embodiment 3
A kind of video generation device of panoramic video meeting is provided in the present embodiment, and the device is for realizing above-mentioned realityExample 1 and its preferred embodiment are applied, the descriptions that have already been made will not be repeated.As used below, term " module " can be withRealize the combination of the software and/or hardware of predetermined function.Although device described in following embodiment is preferably come with software realIt is existing, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.
The present embodiment provides a kind of video generation devices of panoramic video meeting, as shown in Figure 5, comprising: video information obtainsModulus block 100, humanoid detection module 200, action message obtain module 300 and the first image generation module 400.
Wherein, acquiring video information module 100 is used to obtain the panoramic video data at meeting scene using panoramic camera;Humanoid detection module 200 is used to carry out humanoid detection to video frame each in panoramic video data, obtains in panoramic video dataThe humanoid coordinate information of each video frame;Action message obtains module 300 and is used to obtain each video according to humanoid coordinate informationThe action message of a certain mobile human body in frame;First image generation module 400 is used for according to action message to aphorama frequencyEach video frame in carries out image interception, generates the video image of mobile human body.
In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: auditory localization module 500, it is mainSay people's the first locating module 600 and the second image generation module 700.
Wherein, auditory localization module is positioned for 500 using sound source of the microphone array to meeting scene, obtains soundSource location information;The first locating module of speaker 600 is used to obtain sound source in panoramic video data according to sound source position informationSound source coordinate information;Using sound source coordinate information as speaker's location information, or the sound source seat that will include the corresponding momentThe humanoid coordinate information of information is marked as speaker's location information;Second image generation module 700 is used for according to speaker positionInformation carries out image interception to each video frame in panoramic video data, generates the video image of speaker.
In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: the second locating module of speaker,For carrying out Face datection to video frame each in panoramic video data, the face of each video frame in panoramic video data is obtainedCoordinate information;Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.
Embodiment 4
A kind of video generation device of panoramic video meeting is provided in the present embodiment, and the device is for realizing above-mentioned realityExample 2 and its preferred embodiment are applied, the descriptions that have already been made will not be repeated.As used below, term " module " can be withRealize the combination of the software and/or hardware of predetermined function.Although device described in following embodiment is preferably come with software realIt is existing, but the realization of the combination of hardware or software and hardware is also that may and be contemplated.
The present embodiment provides a kind of video generation devices of panoramic video meeting, as shown in Figure 6, comprising: video information connectsReceive module 10, action message receiving module 20 and third image generation module 30.
Wherein, video information receiving module 10 is used to receive the panoramic video data at meeting scene;Action message receives mouldBlock 20 is used to receive the action message of a certain mobile human body in panoramic video data;Action message includes mobile human body in aphoramaHumanoid coordinate information in the different video frame of frequency evidence;Third image generation module 30 is used for according to action message to aphoramaEach video frame of the frequency in carries out image interception, generates the video image of mobile human body.
In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: speaker's information receiving module40 and the 4th image generation module 50.
Wherein, speaker's information receiving module 40 is for receiving speaker's location information;Speaker's location information is panoramaSound source coordinate information in video data or include the corresponding moment sound source coordinate information humanoid coordinate information or packetThe face coordinate information of sound source coordinate information containing the corresponding moment;4th image generation module 50 is used for according to speaker positionInformation carries out image interception to each video frame in panoramic video data, generates the video image of speaker.
In an alternate embodiment of the invention, the video generation device of panoramic video meeting further include: timestamp alignment module is used forBy the timestamp alignment in panoramic video data and panoramic video data in the action message of a certain mobile human body.
Embodiment 5
The embodiment of the invention provides a kind of video conferencing systems, as shown in fig. 7, the video conferencing system may include:At least one processor 701, such as CPU (Central Processing Unit, central processing unit), at least one communication connectMouth 703, memory 704, at least one communication bus 702.Wherein, communication bus 702 is for realizing the company between these componentsConnect letter.Wherein, communication interface 703 may include display screen (Display), keyboard (Keyboard), optional communication interface 703It can also include standard wireline interface and wireless interface.Memory 704 can be high speed RAM memory (Random AccessMemory, effumability random access memory), it is also possible to non-labile memory (non-volatile memory),A for example, at least magnetic disk storage.Memory 704 optionally can also be that at least one is located remotely from aforementioned processor 701Storage device.Application program is wherein stored in memory 704, and processor 701 calls the program generation stored in memory 704Code, with for either executing in embodiment 1 or embodiment 2 method step, i.e., for performing the following operations:
The panoramic video data at meeting scene are obtained using panoramic camera;To each video frame in panoramic video data intoThe detection of pedestrian's shape, obtains the humanoid coordinate information of each video frame in panoramic video data;It is obtained according to humanoid coordinate information eachThe action message of a certain mobile human body in a video frame;According to action message to each video frame in panoramic video data intoRow image interception generates the video image of mobile human body.
In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation:It is positioned using sound source of the microphone array to meeting scene, obtains sound source position information;It is obtained according to sound source position informationSound source coordinate information of the sound source in panoramic video data;Using sound source coordinate information as speaker's location information, or will packetContain the humanoid coordinate information of the sound source coordinate information at corresponding moment as speaker's location information;According to speaker's location informationImage interception is carried out to each video frame in panoramic video data, generates the video image of speaker.
In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation:Face datection is carried out to video frame each in panoramic video data, obtains the face coordinate of each video frame in panoramic video dataInformation;Using include the corresponding moment sound source coordinate information face coordinate information as speaker's location information.
In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation:Receive the action message of a certain mobile human body in panoramic video data;Action message includes mobile human body in panoramic video dataHumanoid coordinate information in different video frame;Image is carried out to each video frame in panoramic video data according to action message to cutIt takes, generates the video image of mobile human body.
In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation:Receive speaker's location information;Speaker's location information is the sound source coordinate information in panoramic video data or includes correspondenceThe humanoid coordinate information of the sound source coordinate information at moment or include the corresponding moment sound source coordinate information face coordinate letterBreath;Image interception is carried out to each video frame in panoramic video data according to speaker's location information, generates the view of speakerFrequency image.
In the embodiment of the present invention, processor 701 calls the program code in memory 704, is also used to execute following operation:By the timestamp alignment in panoramic video data and panoramic video data in the action message of a certain mobile human body.
Wherein, communication bus 702 can be Peripheral Component Interconnect standard (peripheral componentInterconnect, abbreviation PCI) bus or expanding the industrial standard structure (extended industry standardArchitecture, abbreviation EISA) bus etc..Communication bus 702 can be divided into address bus, data/address bus, control bus etc..Only to be indicated with a line in Fig. 7, it is not intended that an only bus or a type of bus convenient for indicating.
Wherein, memory 704 may include volatile memory (English: volatile memory), such as arbitrary accessMemory (English: random-access memory, abbreviation: RAM);Memory also may include nonvolatile memory (EnglishText: non-volatile memory), for example, flash memory (English: flash memory), hard disk (English: hard diskDrive, abbreviation: HDD) or solid state hard disk (English: solid-state drive, abbreviation: SSD);Memory 704 can also wrapInclude the combination of the memory of mentioned kind.
Wherein, processor 701 can be central processing unit (English: central processing unit, abbreviation:CPU), the combination of network processing unit (English: network processor, abbreviation: NP) or CPU and NP.
Wherein, processor 701 can further include hardware chip.Above-mentioned hardware chip can be specific integrated circuit(English: application-specific integrated circuit, abbreviation: ASIC), programmable logic device (English:Programmable logic device, abbreviation: PLD) or combinations thereof.Above-mentioned PLD can be Complex Programmable Logic Devices(English: complex programmable logic device, abbreviation: CPLD), field programmable gate array (English:Field-programmable gate array, abbreviation: FPGA), Universal Array Logic (English: generic arrayLogic, abbreviation: GAL) or any combination thereof.
Embodiment 6
The embodiment of the invention also provides a kind of non-transient computer storage medium, the computer storage medium is stored withEither embodiment 1 or embodiment 2 method step can be performed in computer executable instructions, the computer executable instructions.Wherein,The storage medium can be magnetic disk, CD, read-only memory (Read-Only Memory, ROM), random access memory(Random Access Memory, RAM), flash memory (Flash Memory), hard disk (Hard Disk Drive, contractingWrite: HDD) or solid state hard disk (Solid-State Drive, SSD) etc.;The storage medium can also include depositing for mentioned kindThe combination of reservoir.
Obviously, the above embodiments are merely examples for clarifying the description, and does not limit the embodiments.It is rightFor those of ordinary skill in the art, can also make on the basis of the above description it is other it is various forms of variation orIt changes.There is no necessity and possibility to exhaust all the enbodiments.And it is extended from this it is obvious variation orIt changes still within the protection scope of the invention.