CN107613161A

Movatterモバイル変換

Info

Publication number: CN107613161A
Application number: CN201710948050.6A
Authority: CN
Inventors: 眭帆; 眭一帆
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2017-10-12
Filing date: 2017-10-12
Publication date: 2018-01-19

Abstract

The invention discloses a kind of video data handling procedure based on virtual world and device, computing device, its method includes：Obtain video data；Video data is screened, obtains the pending two field picture for including special object；Scene cut processing is carried out to pending two field picture, obtains being directed to the foreground image of special object；Drawing three-dimensional scene；The key message of special object is extracted from pending two field picture, positional information of the foreground image in three-dimensional scenic is obtained according to key message；According to positional information, foreground image and three-dimensional scenic are subjected to fusion treatment, the two field picture after being handled；Two field picture after processing is covered into the video data after pending two field picture is handled.Present invention employs deep learning method, completes scene cut processing with realizing the high accuracy of high efficiency.And user's technical merit is not limited, it is not necessary to which user is handled video manually, is realized the processing to video automatically, is greatlyd save user time.

Description

Translated fromChinese

基于虚拟世界的视频数据处理方法及装置、计算设备Video data processing method, device, and computing device based on virtual world

技术领域technical field

本发明涉及图像处理领域，具体涉及一种基于虚拟世界的视频数据处理方法及装置、计算设备。The invention relates to the field of image processing, in particular to a virtual world-based video data processing method, device, and computing device.

背景技术Background technique

随着科技的发展，图像采集设备的技术也日益提高。使用图像采集设备录制的视频也更加清晰、分辨率、显示效果也大幅提高。但现有录制的视频仅是单调的录制素材本身，无法满足用户提出的越来越多的个性化要求。现有技术可以在录制视频后，可以由用户手动对视频再做进一步的处理。但这样处理需要用户具有较高的图像处理技术，并且在处理时需要花费用户较多的时间，处理繁琐，技术复杂。With the development of science and technology, the technology of image acquisition equipment is also improving day by day. The video recorded by the image acquisition device is also clearer, the resolution, and the display effect are also greatly improved. However, the existing recorded video is only monotonous recording material itself, which cannot meet the increasing personalization requirements put forward by users. In the prior art, after the video is recorded, the user can manually further process the video. However, this kind of processing requires the user to have high image processing technology, and it takes a lot of time for the user to process, which is cumbersome and technically complicated.

因此，需要一种基于虚拟世界的视频数据处理方法，在满足用户的个性化要求的同时降低技术要求门槛。Therefore, there is a need for a video data processing method based on the virtual world, which can lower the threshold of technical requirements while meeting the individual requirements of users.

发明内容Contents of the invention

鉴于上述问题，提出了本发明以便提供一种克服上述问题或者至少部分地解决上述问题的基于虚拟世界的视频数据处理方法及装置、计算设备。In view of the above problems, the present invention is proposed to provide a virtual world-based video data processing method, device, and computing device that overcome the above problems or at least partially solve the above problems.

根据本发明的一个方面，提供了一种基于虚拟世界的视频数据处理方法，其包括：According to one aspect of the present invention, a method for processing video data based on a virtual world is provided, which includes:

获取视频数据；Get video data;

对视频数据进行甄别，获取包含特定对象的待处理的帧图像；Screening the video data to obtain frame images containing specific objects to be processed;

对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像；Perform scene segmentation processing on the frame image to be processed to obtain a foreground image for a specific object;

绘制三维场景；Draw a 3D scene;

从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息；Extract the key information of a specific object from the frame image to be processed, and obtain the position information of the foreground image in the 3D scene according to the key information;

根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像；According to the position information, the foreground image is fused with the 3D scene to obtain the processed frame image;

将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。Overlay the processed frame image on the frame image to be processed to obtain processed video data.

可选地，获取视频数据进一步包括：Optionally, obtaining video data further includes:

获取本地视频数据和/或网络视频数据。Get local video data and/or network video data.

获取由多个本地图片和/或多个网络图片合成的视频数据。Obtain video data composed of multiple local pictures and/or multiple network pictures.

可选地，对视频数据进行甄别，获取包含特定对象的待处理的帧图像进一步包括：Optionally, screening the video data, and obtaining the frame image to be processed containing a specific object further includes:

对用户指定时间段的视频数据进行甄别，获取包含特定对象的待处理的帧图像。Screen the video data of the time period specified by the user, and obtain the frame image containing the specific object to be processed.

可选地，关键信息为关键点信息；Optionally, the key information is key point information;

从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息进一步包括：The key information of a specific object is extracted from the frame image to be processed, and the position information of the foreground image in the three-dimensional scene is obtained according to the key information, which further includes:

从待处理的帧图像中提取出位于特定对象的关键点信息。The key point information of a specific object is extracted from the frame image to be processed.

可选地，从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息进一步包括：Optionally, extracting the key information of a specific object from the frame image to be processed, and obtaining the position information of the foreground image in the 3D scene according to the key information further includes:

根据特定对象的关键点信息，计算具有对称关系的至少两个关键点之间的距离；According to the key point information of a specific object, calculate the distance between at least two key points having a symmetrical relationship;

根据具有对称关系的至少两个关键点之间的距离，得到前景图像在三维场景中的深度位置信息。According to the distance between at least two key points having a symmetrical relationship, the depth position information of the foreground image in the three-dimensional scene is obtained.

根据关键点信息，获取特定对象在待处理的帧图像中的位置信息；Obtain the position information of the specific object in the frame image to be processed according to the key point information;

根据特定对象在待处理的帧图像中的位置信息，得到前景图像在三维场景中的左右位置信息。According to the position information of the specific object in the frame image to be processed, the left and right position information of the foreground image in the three-dimensional scene is obtained.

可选地，方法还包括：Optionally, the method also includes:

获取三维场景的地形信息；Obtain terrain information of the 3D scene;

根据三维场景的地形信息、前景图像在三维场景中的左右位置信息和/或深度位置信息，得到前景图像在三维场景中的上下位置信息。According to the terrain information of the 3D scene, the left and right position information and/or the depth position information of the foreground image in the 3D scene, the upper and lower position information of the foreground image in the 3D scene is obtained.

可选地，根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像进一步包括：Optionally, according to the location information, the foreground image is fused with the three-dimensional scene, and the processed frame image further includes:

根据前景图像在三维场景中的深度位置信息、左右位置信息和/或上下位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像。According to the depth position information, left and right position information and/or up and down position information of the foreground image in the 3D scene, the foreground image is fused with the 3D scene to obtain a processed frame image.

可选地，在得到处理后的帧图像之前，方法还包括：Optionally, before obtaining the processed frame image, the method further includes:

在前景图像的特定对象的特定区域绘制效果贴图。Paints an effect map on a specific area of a specific object in the foreground image.

可选地，三维场景包含实时变化的天气信息。Optionally, the three-dimensional scene includes real-time changing weather information.

可选地，三维场景包含可变化的光照信息。Optionally, the 3D scene contains variable lighting information.

可选地，方法还包括：Optionally, the method also includes:

将处理后的视频数据上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。Upload the processed video data to one or more cloud video platform servers for the cloud video platform servers to display the video data on the cloud video platform.

根据本发明的另一方面，提供了一种基于虚拟世界的视频数据处理装置，其包括：According to another aspect of the present invention, a virtual world-based video data processing device is provided, which includes:

获取模块，适于获取视频数据；an acquisition module adapted to acquire video data;

甄别模块，适于对视频数据进行甄别，获取包含特定对象的待处理的帧图像；The discrimination module is adapted to discriminate the video data, and acquires a frame image to be processed containing a specific object;

分割模块，适于对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像；The segmentation module is adapted to perform scene segmentation processing on the frame image to be processed to obtain a foreground image for a specific object;

绘制模块，适于绘制三维场景；A drawing module, suitable for drawing three-dimensional scenes;

提取模块，适于从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息；The extraction module is adapted to extract the key information of a specific object from the frame image to be processed, and obtain the position information of the foreground image in the three-dimensional scene according to the key information;

融合模块，适于根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像；The fusion module is suitable for performing fusion processing on the foreground image and the three-dimensional scene according to the position information to obtain the processed frame image;

覆盖模块，适于将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。The overlay module is adapted to cover the frame image to be processed with the processed frame image to obtain processed video data.

可选地，获取模块进一步适于：Optionally, the acquisition module is further adapted to:

可选地，甄别模块进一步适于：Optionally, the screening module is further adapted to:

提取模块进一步适于：从待处理的帧图像中提取出位于特定对象的关键点信息。The extraction module is further adapted to: extract key point information of a specific object from the frame image to be processed.

可选地，提取模块进一步包括：Optionally, the extraction module further includes:

第一位置模块，适于根据特定对象的关键点信息，计算具有对称关系的至少两个关键点之间的距离；根据具有对称关系的至少两个关键点之间的距离，得到前景图像在三维场景中的深度位置信息。The first position module is adapted to calculate the distance between at least two key points with a symmetrical relationship according to the key point information of a specific object; according to the distance between the at least two key points with a symmetrical relationship, obtain the foreground image in three dimensions Depth location information in the scene.

第二位置模块，适于根据关键点信息，获取特定对象在待处理的帧图像中的位置信息；根据特定对象在待处理的帧图像中的位置信息，得到前景图像在三维场景中的左右位置信息。The second position module is adapted to obtain the position information of the specific object in the frame image to be processed according to the key point information; obtain the left and right positions of the foreground image in the three-dimensional scene according to the position information of the specific object in the frame image to be processed information.

可选地，装置还包括：Optionally, the device also includes:

第三位置模块，适于获取三维场景的地形信息；根据三维场景的地形信息、前景图像在三维场景中的左右位置信息和/或深度位置信息，得到前景图像在三维场景中的上下位置信息。The third position module is suitable for obtaining terrain information of the 3D scene; according to the terrain information of the 3D scene, the left and right position information and/or the depth position information of the foreground image in the 3D scene, the upper and lower position information of the foreground image in the 3D scene is obtained.

可选地，融合模块进一步适于：Optionally, the fusion module is further adapted to:

可选地，装置还包括：Optionally, the device also includes:

贴图模块，适于在前景图像的特定对象的特定区域绘制效果贴图。A texture module, suitable for drawing effect textures on specific areas of specific objects in the foreground image.

可选地，装置还包括：Optionally, the device also includes:

上传模块，适于将处理后的视频数据上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。The upload module is suitable for uploading the processed video data to one or more cloud video platform servers, so that the cloud video platform servers can display the video data on the cloud video platform.

根据本发明的又一方面，提供了一种计算设备，包括：处理器、存储器、通信接口和通信总线，所述处理器、所述存储器和所述通信接口通过所述通信总线完成相互间的通信；According to yet another aspect of the present invention, a computing device is provided, including: a processor, a memory, a communication interface, and a communication bus, and the processor, the memory, and the communication interface complete mutual communication through the communication bus communication;

所述存储器用于存放至少一可执行指令，所述可执行指令使所述处理器执行上述基于虚拟世界的视频数据处理方法对应的操作。The memory is used to store at least one executable instruction, and the executable instruction causes the processor to perform operations corresponding to the above-mentioned virtual world-based video data processing method.

根据本发明的再一方面，提供了一种计算机存储介质，所述存储介质中存储有至少一可执行指令，所述可执行指令使处理器执行如上述基于虚拟世界的视频数据处理方法对应的操作。According to still another aspect of the present invention, a computer storage medium is provided, wherein at least one executable instruction is stored in the storage medium, and the executable instruction causes the processor to perform the above-mentioned method corresponding to the virtual world-based video data processing method. operate.

根据本发明提供的基于虚拟世界的视频数据处理方法及装置、计算设备、存储介质，获取视频数据；对视频数据进行甄别，获取包含特定对象的待处理的帧图像；对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像；绘制三维场景；从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息；根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像；将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。本发明在对视频数据进行甄别，获取包含特定对象的待处理的帧图像后，从待处理的帧图像中分割出特定对象的前景图像。根据从待处理的帧图像中提取出特定对象的关键信息，得到前景图像在三维场景中的位置信息，便于将前景图像和三维场景进行融合，得到的处理后的视频呈现出特定对象位于三维场景中的效果。本发明采用了深度学习方法，实现了高效率高精准性地完成场景分割处理。且对用户技术水平不做限制，不需要用户手动对视频进行处理，自动实现对视频的处理，大大节省用户时间。According to the virtual world-based video data processing method and device, computing device, and storage medium provided by the present invention, video data is obtained; video data is screened to obtain a frame image to be processed containing a specific object; scene processing is performed on the frame image to be processed Segmentation processing to obtain the foreground image for a specific object; draw the 3D scene; extract the key information of the specific object from the frame image to be processed, and obtain the position information of the foreground image in the 3D scene according to the key information; according to the position information, the The foreground image is fused with the three-dimensional scene to obtain a processed frame image; the processed frame image is overlaid on the frame image to be processed to obtain processed video data. The present invention screens the video data and acquires the to-be-processed frame image containing the specific object, and then segments the foreground image of the specific object from the to-be-processed frame image. According to the key information of the specific object extracted from the frame image to be processed, the position information of the foreground image in the 3D scene is obtained, which facilitates the fusion of the foreground image and the 3D scene, and the obtained processed video shows that the specific object is located in the 3D scene. in the effect. The present invention adopts a deep learning method to realize scene segmentation processing with high efficiency and high precision. And there is no limit to the user's technical level, and the user does not need to manually process the video, and automatically realizes the video processing, which greatly saves the user's time.

上述说明仅是本发明技术方案的概述，为了能够更清楚了解本发明的技术手段，而可依照说明书的内容予以实施，并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂，以下特举本发明的具体实施方式。The above description is only an overview of the technical solution of the present invention. In order to better understand the technical means of the present invention, it can be implemented according to the contents of the description, and in order to make the above and other purposes, features and advantages of the present invention more obvious and understandable , the specific embodiments of the present invention are enumerated below.

附图说明Description of drawings

通过阅读下文优选实施方式的详细描述，各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的，而并不认为是对本发明的限制。而且在整个附图中，用相同的参考符号表示相同的部件。在附图中：Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiment. The drawings are only for the purpose of illustrating a preferred embodiment and are not to be considered as limiting the invention. Also throughout the drawings, the same reference numerals are used to designate the same parts. In the attached picture:

图1示出了根据本发明一个实施例的基于虚拟世界的视频数据处理方法的流程图；Fig. 1 shows a flow chart of a method for processing video data based on a virtual world according to an embodiment of the present invention;

图2示出了根据本发明另一个实施例的基于虚拟世界的视频数据处理方法的流程图；FIG. 2 shows a flow chart of a method for processing video data based on a virtual world according to another embodiment of the present invention;

图3示出了根据本发明一个实施例的基于虚拟世界的视频数据处理装置的功能框图；Fig. 3 shows a functional block diagram of a video data processing device based on a virtual world according to an embodiment of the present invention;

图4示出了根据本发明另一个实施例的基于虚拟世界的视频数据处理装置的功能框图；Fig. 4 shows a functional block diagram of a video data processing device based on a virtual world according to another embodiment of the present invention;

图5示出了根据本发明一个实施例的一种计算设备的结构示意图。Fig. 5 shows a schematic structural diagram of a computing device according to an embodiment of the present invention.

具体实施方式detailed description

下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例，然而应当理解，可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反，提供这些实施例是为了能够更透彻地理解本公开，并且能够将本公开的范围完整的传达给本领域的技术人员。Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited by the embodiments set forth herein. Rather, these embodiments are provided for more thorough understanding of the present disclosure and to fully convey the scope of the present disclosure to those skilled in the art.

本发明中特定对象可以是图像中的人体、植物、动物等任何对象，在实施例中以人体为例进行说明，但不仅限于人体。In the present invention, the specific object may be any object in the image, such as human body, plant, animal, etc. In the embodiment, the human body is used as an example for illustration, but it is not limited to the human body.

图1示出了根据本发明一个实施例的基于虚拟世界的视频数据处理方法的流程图。如图1所示，基于虚拟世界的视频数据处理方法具体包括如下步骤：Fig. 1 shows a flowchart of a method for processing video data based on a virtual world according to an embodiment of the present invention. As shown in Figure 1, the video data processing method based on the virtual world specifically includes the following steps:

步骤S101，获取视频数据。Step S101, acquiring video data.

获取的视频数据可以是用户本地的视频数据，也可以获取网络的视频数据。或者还可以获取由多个本地图片合成的视频数据，或者获取由多个网络图片合成的视频数据，或者获取由多个本地图片和多个网络图片合成的视频数据。The acquired video data may be local video data of the user, or acquired video data of the network. Alternatively, video data synthesized from multiple local pictures, or video data synthesized from multiple network pictures, or video data synthesized from multiple local pictures and multiple network pictures may also be obtained.

步骤S102，对视频数据进行甄别，获取包含特定对象的待处理的帧图像。Step S102 , screening the video data, and obtaining frame images containing specific objects to be processed.

视频数据中包含很多帧图像，需要对视频数据进行甄别。由于本发明对特定对象进行处理，因此进行甄别后获取包含特定对象的待处理的帧图像。Video data contains many frames of images, and video data needs to be screened. Since the present invention processes the specific object, the frame image to be processed including the specific object is obtained after screening.

步骤S103，对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像。Step S103, performing scene segmentation processing on the frame image to be processed to obtain a foreground image for a specific object.

待处理的帧图像包含了特定对象，如人体。对待处理的帧图像进行场景分割处理，主要是将特定对象从待处理的帧图像中分割出来，得到针对于特定对象的前景图像，该前景图像可以仅包含特定对象。The frame image to be processed contains a specific object, such as a human body. The scene segmentation processing of the frame image to be processed is mainly to segment a specific object from the frame image to be processed to obtain a foreground image for a specific object, and the foreground image may only contain a specific object.

在对待处理的帧图像进行场景分割处理时，可以利用深度学习方法。深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示，如每个像素强度值的向量，或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如，人脸识别或面部表情识别)。如利用深度学习的人体分割方法可以对待处理的帧图像进行场景分割，得到包含人体的前景图像。进一步，在对待处理的帧图像进行场景分割得到的包含人体的前景图像时，可以得到人体的全部图像或者仅得到人体的大部分图像，此处不做限定。When performing scene segmentation processing on the frame images to be processed, deep learning methods can be used. Deep learning is a method based on representation learning of data in machine learning. Observations (such as an image) can be represented in a variety of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, regions of a specific shape, etc. Whereas it is easier to learn tasks from examples (e.g., face recognition or facial expression recognition) using some specific representations. For example, the human body segmentation method using deep learning can perform scene segmentation on the frame image to be processed, and obtain the foreground image containing the human body. Further, when performing scene segmentation on the frame image to be processed to obtain the foreground image including the human body, all images of the human body or only most of the images of the human body can be obtained, which is not limited here.

步骤S104，绘制三维场景。Step S104, drawing a three-dimensional scene.

三维场景可以为三维的虚拟场景，也可以将真实的场景三维化处理为三维场景。三维场景中可以包含如森林、瀑布、湖泊等各种对象，此外不限定三维场景的具体内容。The 3D scene may be a 3D virtual scene, or a real scene may be 3D processed into a 3D scene. The three-dimensional scene may contain various objects such as forests, waterfalls, lakes, etc., and the specific content of the three-dimensional scene is not limited.

三维场景中还包含了实时变化的天气信息，如阴天、晴天、下雨等不同的天气变化的场景。绘制的三维场景中实时变化的天气信息使得三维场景更加真实、呈现的效果更生动。三维场景中还包含了可变化的光照信息，如晴天时可以为太阳光照效果，下雨时可以有闪电光照效果，黑暗的场景还可以有萤火虫飞舞的光照效果(萤火虫飞舞时可以设置为在指定位置飞舞，也可以根据后续融合时，在特定对象周围飞舞等)等，以使整个三维场景更加协调。The 3D scene also includes real-time changing weather information, such as cloudy, sunny, raining and other different weather changing scenes. The real-time changing weather information in the drawn 3D scene makes the 3D scene more realistic and the rendering effect more vivid. The 3D scene also contains variable lighting information, such as the sun lighting effect on sunny days, lightning lighting effects on rainy days, and firefly flying lighting effects on dark scenes (when fireflies are flying, it can be set to It can also fly around a specific object according to the subsequent fusion, etc.), so as to make the whole 3D scene more coordinated.

绘制三维场景的技术可以采用任一绘制技术，此处不做限定。The technology for rendering the 3D scene may use any rendering technology, which is not limited here.

步骤S105，从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息。Step S105, extract key information of a specific object from the frame image to be processed, and obtain position information of the foreground image in the 3D scene according to the key information.

从待处理的帧图像中提取特定对象的关键信息，该关键信息可以具体为关键点信息、关键区域信息、和/或关键线信息等。本发明的实施例以关键点信息为例进行说明，但本发明的关键信息不限于是关键点信息。使用关键点信息可以提高根据关键点信息获取位置信息的处理速度和效率，可以直接根据关键点信息获取位置信息，不需要再对关键信息进行后续计算、分析等复杂操作。同时，关键点信息便于提取，且提取准确，获取位置信息的效果更精准。由于一般通过特定对象的边缘位置的关键点信息获取位置信息，因此，在从待处理的帧图像中提取时，可以提取出位于特定对象边缘的关键点信息。当特定对象为人体时，提取的关键点信息包括位于人脸边缘的关键点信息、位于人体边缘的关键点信息等。Key information of a specific object is extracted from the frame image to be processed, and the key information may specifically be key point information, key area information, and/or key line information. The embodiment of the present invention is described by taking the key point information as an example, but the key information in the present invention is not limited to the key point information. The use of key point information can improve the processing speed and efficiency of obtaining location information based on key point information, and can directly obtain location information based on key point information, without the need for subsequent calculations, analysis and other complex operations on key information. At the same time, the key point information is easy to extract, and the extraction is accurate, and the effect of obtaining position information is more accurate. Since the location information is generally acquired through the key point information of the edge position of the specific object, when extracting from the frame image to be processed, the key point information located at the edge of the specific object can be extracted. When the specific object is a human body, the extracted key point information includes key point information located at the edge of the face, key point information located at the edge of the human body, and the like.

三维场景中的位置信息具体包括了三维场景中的左右、上下、深度位置信息，分别对应了三维场景中x轴、y轴、z轴方向上的各位置信息。根据从待处理的帧图像中提取的特定对象的关键点信息，可以相应的确定前景图像在三维场景中的位置信息。The location information in the 3D scene specifically includes the left-right, top-bottom, and depth location information in the 3D scene, corresponding to the location information in the x-axis, y-axis, and z-axis directions of the 3D scene, respectively. According to the key point information of a specific object extracted from the frame image to be processed, the position information of the foreground image in the three-dimensional scene can be determined accordingly.

步骤S106，根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像。Step S106 , according to the position information, the foreground image is fused with the 3D scene to obtain a processed frame image.

根据位置信息，将前景图像设置在三维场景中对应的位置处，使前景图像与三维场景进行融合，得到处理后的帧图像。为使前景图像可以更好的与三维场景融合，在对待处理的帧图像进行分割处理时，对分割得到的前景处理的边缘进行半透明处理，模糊特定对象的边缘，以便更好的融合。According to the position information, the foreground image is set at a corresponding position in the three-dimensional scene, and the foreground image is fused with the three-dimensional scene to obtain a processed frame image. In order to better integrate the foreground image with the three-dimensional scene, when the frame image to be processed is segmented, the edge of the segmented foreground processing is semi-transparent, and the edge of a specific object is blurred for better fusion.

步骤S107，将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。In step S107, the processed frame image is overlaid on the frame image to be processed to obtain processed video data.

使用处理后的帧图像直接覆盖掉对应的待处理的帧图像，直接可以得到处理后的视频数据。The processed frame image is used to directly cover the corresponding frame image to be processed, and the processed video data can be obtained directly.

根据本发明提供的基于虚拟世界的视频数据处理方法，获取视频数据；对视频数据进行甄别，获取包含特定对象的待处理的帧图像；对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像；绘制三维场景；从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息；根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像；将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。本发明在对视频数据进行甄别，获取包含特定对象的待处理的帧图像后，从待处理的帧图像中分割出特定对象的前景图像。根据从待处理的帧图像中提取出特定对象的关键信息，得到前景图像在三维场景中的位置信息，便于将前景图像和三维场景进行融合，得到的处理后的视频呈现出特定对象位于三维场景中的效果。本发明采用了深度学习方法，实现了高效率高精准性地完成场景分割处理。且对用户技术水平不做限制，不需要用户手动对视频进行处理，自动实现对视频的处理，大大节省用户时间。According to the video data processing method based on the virtual world provided by the present invention, the video data is obtained; the video data is screened to obtain the frame image to be processed containing a specific object; the frame image to be processed is subjected to scene segmentation processing to obtain the foreground image; draw the 3D scene; extract the key information of a specific object from the frame image to be processed, and obtain the position information of the foreground image in the 3D scene according to the key information; according to the position information, fuse the foreground image with the 3D scene , to obtain the processed frame image; the processed frame image is overlaid on the frame image to be processed to obtain the processed video data. The present invention screens the video data and acquires the to-be-processed frame image containing the specific object, and then segments the foreground image of the specific object from the to-be-processed frame image. According to the key information of the specific object extracted from the frame image to be processed, the position information of the foreground image in the 3D scene is obtained, which facilitates the fusion of the foreground image and the 3D scene, and the obtained processed video shows that the specific object is located in the 3D scene. in the effect. The present invention adopts a deep learning method to realize scene segmentation processing with high efficiency and high precision. And there is no limit to the user's technical level, and the user does not need to manually process the video, and automatically realizes the video processing, which greatly saves the user's time.

图2示出了根据本发明另一个实施例的基于虚拟世界的视频数据处理方法的流程图。如图2所示，基于虚拟世界的视频数据处理方法具体包括如下步骤：Fig. 2 shows a flowchart of a method for processing video data based on a virtual world according to another embodiment of the present invention. As shown in Figure 2, the video data processing method based on the virtual world specifically includes the following steps:

步骤S201，获取视频数据。Step S201, acquiring video data.

获取的视频数据可以是用户本地的视频数据，也可以获取网络的视频数据。或者还可以获取由多个本地图片合成的视频数据，或者获取由多个网络图片合成的视频数据，或者获取由多个本地图片和多个网络图片合成的视频数据。The acquired video data may be local video data of the user, or acquired video data of the network. Alternatively, video data synthesized from multiple local pictures, or video data synthesized from multiple network pictures, or video data synthesized from multiple local pictures and multiple network pictures may also be acquired.

步骤S202，对用户指定时间段的视频数据进行甄别，获取包含特定对象的待处理的帧图像。In step S202, the video data of the time period specified by the user is screened, and a frame image to be processed including a specific object is acquired.

视频数据中包含很多帧图像，需要对视频数据进行甄别。同时，在甄别时，还可以根据用户指定时间段，仅对用户指定时间段内的视频数据进行甄别，而不需要对其他时间段的视频数据进行甄别。如由于视频数据的后半段为高潮时段，往往用户指定时间段为视频数据的后半段。因此仅对用户指定时间段的视频数据进行甄别，获取用户指定时间段的视频数据中包含特定对象的待处理的帧图像。Video data contains many frames of images, and video data needs to be screened. At the same time, during the screening, only the video data in the user-designated time period can be screened according to the user-designated time period, and the video data in other time periods does not need to be screened. For example, since the second half of the video data is the climax period, the user often specifies a time period as the second half of the video data. Therefore, only the video data of the time period specified by the user is screened, and the frame images to be processed including the specific object in the video data of the time period specified by the user are obtained.

步骤S203，对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像。Step S203, performing scene segmentation processing on the frame image to be processed to obtain a foreground image for a specific object.

步骤S204，绘制三维场景。Step S204, drawing a three-dimensional scene.

以上步骤参考图1实施例中的步骤S103-S104的描述，在此不再赘述。For the above steps, refer to the description of steps S103-S104 in the embodiment of FIG. 1 , and details are not repeated here.

步骤S205，在前景图像的特定对象的特定区域绘制效果贴图。Step S205, drawing an effect map on a specific area of a specific object in the foreground image.

当得到针对于特定对象的前景图像中，特定对象仅为一部分时，如得到仅包含人体上半身的前景图像。此时，可以在前景图像的特定对象的特定区域绘制效果贴图，进行遮挡或美化。具体的，可以在人体上半身的下方绘制如云朵的效果贴图，以形成人体漂浮在空中的效果。效果贴图可以根据三维场景、特定对象的不同设置为不同的效果贴图，以使效果贴图与三维场景、特定对象的风格、显示效果等相呼应，呈现整体一致的效果。When obtaining a foreground image for a specific object, the specific object is only a part, such as obtaining a foreground image containing only the upper body of a human body. At this point, effect maps can be drawn on specific areas of specific objects in the foreground image for occlusion or beautification. Specifically, an effect map such as clouds can be drawn under the upper body of the human body to form the effect of the human body floating in the air. Effect maps can be set as different effect maps according to different 3D scenes and specific objects, so that the effect maps can echo with the 3D scene, the style of specific objects, display effects, etc., and present an overall consistent effect.

步骤S206，从待处理的帧图像中提取出特定对象的关键点信息。Step S206, extract key point information of a specific object from the frame image to be processed.

从待处理的帧图像中提取出特定对象的关键点信息，关键点信息包括特定对象边缘的关键点信息，还可以包括特定对象的特定区域的关键点信息等，方便后续根据关键点信息进行计算。Extract the key point information of a specific object from the frame image to be processed. The key point information includes the key point information of the edge of the specific object, and can also include the key point information of the specific area of the specific object, etc., to facilitate subsequent calculations based on the key point information .

步骤S207，根据特定对象的关键点信息，计算具有对称关系的至少两个关键点之间的距离。Step S207, according to the key point information of the specific object, calculate the distance between at least two key points having a symmetrical relationship.

步骤S208，根据具有对称关系的至少两个关键点之间的距离，得到前景图像在三维场景中的深度位置信息。Step S208, obtaining depth position information of the foreground image in the 3D scene according to the distance between at least two key points having a symmetrical relationship.

由于特定对象与图像采集设备的距离不同，导致特定对象在待处理的帧图像中的大小也不一致。如人体与图像采集设备的距离较远时，人体在待处理的帧图像中呈现较小，人体与图像采集设备的距离较近时，人体在待处理的帧图像中呈现较大。根据特定对象的关键点信息，可以计算出具有对称关系的至少两个关键点之间的距离。如计算出人脸边缘两个眼角所在为止的关键点之间的距离。根据具有对称关系的至少两个关键点之间的距离，结合特定对象的实际距离，可以得出特定对象与图像采集设备的距离。根据距离，可以得到前景图像在三维场景中的深度位置信息，即前景图像与三维场景融合时，前景图像设置在三维场景中具体的深度位置信息。如计算出人脸边缘两个眼角所在为止的关键点之间的距离，得到人体与图像采集设备的距离较远，由于人体在待处理的帧图像中呈现较小，分割得到的前景图像也较小，前景图像设置在三维场景中的深度位置也较深，呈现出前景图像在三维场景中较深位置，人体在三维场景中也较小的效果。或者计算出人脸边缘两个眼角所在为止的关键点之间的距离，得到人体与图像采集设备的距离较近，由于人体在待处理的帧图像中呈现较大，前景图像设置在三维场景中的深度位置会比较靠前，呈现出前景图像在三维场景中较靠前的位置，人体在三维场景中也较大的效果。前景图像在三维场景中的深度位置信息与具有对称关系的至少两个关键点之间的距离相关。Due to the different distances between the specific object and the image acquisition device, the sizes of the specific object in the frame images to be processed are also inconsistent. For example, when the distance between the human body and the image acquisition device is relatively long, the human body appears smaller in the frame image to be processed, and when the distance between the human body and the image acquisition device is relatively short, the human body appears larger in the frame image to be processed. According to the key point information of a specific object, the distance between at least two key points having a symmetrical relationship can be calculated. For example, calculate the distance between the key points until the corners of the two eyes on the edge of the face. According to the distance between at least two key points having a symmetrical relationship, combined with the actual distance of the specific object, the distance between the specific object and the image acquisition device can be obtained. According to the distance, the depth position information of the foreground image in the 3D scene can be obtained, that is, the specific depth position information of the foreground image set in the 3D scene when the foreground image is fused with the 3D scene. For example, if the distance between the key points of the two eye corners on the edge of the face is calculated, the distance between the human body and the image acquisition device is relatively long. Since the human body appears smaller in the frame image to be processed, the foreground image obtained by segmentation is also relatively small. Small, the foreground image is set at a deeper position in the 3D scene, showing the effect that the foreground image is at a deeper position in the 3D scene, and the human body is also smaller in the 3D scene. Or calculate the distance between the key points of the two eye corners on the edge of the face, and get the distance between the human body and the image acquisition device. Since the human body appears larger in the frame image to be processed, the foreground image is set in a three-dimensional scene The depth position will be closer to the front, showing the effect that the foreground image is closer to the front in the 3D scene, and the human body is also larger in the 3D scene. The depth position information of the foreground image in the three-dimensional scene is related to the distance between at least two key points having a symmetrical relationship.

步骤S209，根据关键点信息，获取特定对象在待处理的帧图像中的位置信息。Step S209, according to the key point information, obtain the position information of the specific object in the frame image to be processed.

步骤S210，根据特定对象在待处理的帧图像中的位置信息，得到前景图像在三维场景中的左右位置信息。Step S210, according to the position information of the specific object in the frame image to be processed, the left and right position information of the foreground image in the three-dimensional scene is obtained.

位置信息通过特定对象的关键点信息进行计算后可以获取，得到特定对象在待处理的帧图像中的具体的位置。此处特定对象在待处理的帧图像中的位置信息包括特定对象在待处理的帧图像中的左右位置信息、上下位置信息、特定对象旋转角度信息等位置信息。根据特定对象在待处理的帧图像中的位置信息，可以得到前景图像在三维场景中的左右位置信息。其中，前景图像在三维场景中的左右位置信息与特定对象在待处理的帧图像中的左右位置信息相对应。进一步，还可以根据特定对象在待处理的帧图像中的上下位置信息、旋转角度信息，设置前景图像在三维场景中的上下位置信息、旋转角度信息等。The position information can be obtained after being calculated through the key point information of the specific object, and the specific position of the specific object in the frame image to be processed can be obtained. Here, the position information of the specific object in the frame image to be processed includes position information such as left and right position information, up and down position information, and rotation angle information of the specific object in the frame image to be processed. According to the position information of the specific object in the frame image to be processed, the left and right position information of the foreground image in the three-dimensional scene can be obtained. Wherein, the left and right position information of the foreground image in the three-dimensional scene corresponds to the left and right position information of the specific object in the frame image to be processed. Further, the up and down position information and rotation angle information of the foreground image in the three-dimensional scene can also be set according to the up and down position information and rotation angle information of the specific object in the frame image to be processed.

步骤S211，获取三维场景的地形信息。Step S211, acquiring terrain information of the 3D scene.

步骤S212，根据三维场景的地形信息、前景图像在三维场景中的左右位置信息和/或深度位置信息，得到前景图像在三维场景中的上下位置信息。Step S212, according to the terrain information of the 3D scene, the left and right position information and/or the depth position information of the foreground image in the 3D scene, obtain the upper and lower position information of the foreground image in the 3D scene.

获取三维场景的地形信息，其中，地形信息包括了如台阶、石头、湖泊等各种地形的在三维场景中的左右、上下、深度位置信息。根据三维场景的地形信息，综合前景图像在三维场景中的左右位置信息、深度位置信息等，可以得到前景图像在三维场景中的上下位置信息。具体的，根据前景图像在三维场景中的左右位置信息、深度位置信息，可以先确定当前左右位置信息、深度位置信息处的三维场景的地形。当该地形为台阶时，根据台阶地形的上下位置信息，调整前景图像在三维场景中的上下位置信息，避免出现特定对象被设置在台阶中间的情况。或者当该地形为石头时，根据石头地形的上下位置信息，调整前景图像在三维场景中的上下位置信息，避免出现特定对象被卡在石头中的情况，可以将特定对象设置在石头上或是特定对象设置在石头后方的位置。前景图像在三维场景中的上下位置信息会随着特定对象在三维场景中的左右位置信息和/或深度位置信息的不同、对应的三维场景的地形信息不同而变化。具体变化根据实施情况进行设置。The terrain information of the 3D scene is obtained, wherein the terrain information includes the left, right, up and down, and depth position information of various terrains such as steps, stones, and lakes in the 3D scene. According to the terrain information of the 3D scene, the left and right position information and the depth position information of the foreground image in the 3D scene can be combined to obtain the up and down position information of the foreground image in the 3D scene. Specifically, according to the left and right position information and depth position information of the foreground image in the 3D scene, the topography of the 3D scene at the current left and right position information and depth position information may be determined first. When the terrain is a step, the up and down position information of the foreground image in the three-dimensional scene is adjusted according to the up and down position information of the step terrain, so as to avoid the situation that a specific object is set in the middle of the step. Or when the terrain is a stone, according to the top and bottom position information of the stone terrain, adjust the up and down position information of the foreground image in the 3D scene to avoid the situation that the specific object is stuck in the stone, and the specific object can be set on the stone or A specific object is set at the position behind the stone. The upper and lower position information of the foreground image in the 3D scene will vary with the difference in the left and right position information and/or depth position information of a specific object in the 3D scene, and the corresponding terrain information of the 3D scene. The specific changes are set according to the implementation situation.

步骤S213，根据前景图像在三维场景中的深度位置信息、左右位置信息和/或上下位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像。Step S213, according to the depth position information, left and right position information and/or up and down position information of the foreground image in the 3D scene, the foreground image is fused with the 3D scene to obtain a processed frame image.

根据得到的前景图像在三维场景中的深度位置信息、左右位置信息和/或上下位置信息，将前景图像设置在三维场景中对应的位置处，使前景图像与三维场景进行融合，得到处理后的帧图像。According to the obtained depth position information, left and right position information and/or up and down position information of the foreground image in the three-dimensional scene, the foreground image is set at the corresponding position in the three-dimensional scene, and the foreground image is fused with the three-dimensional scene to obtain the processed image. frame image.

步骤S214，将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。In step S214, the processed frame image is overlaid on the frame image to be processed to obtain processed video data.

步骤S215，将处理后的视频数据上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。Step S215, upload the processed video data to one or more cloud video platform servers, so that the cloud video platform servers can display the video data on the cloud video platform.

处理后的视频数据可以保存在本地仅供用户观看，也可以将处理后的视频数据直接上传至一个或多个云视频平台服务器，如爱奇艺、优酷、快视频等云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。The processed video data can be saved locally for users to watch only, or the processed video data can be directly uploaded to one or more cloud video platform servers, such as iQiyi, Youku, Kuai Video and other cloud video platform servers, to For the cloud video platform server to display video data on the cloud video platform.

根据本发明提供的基于虚拟世界的视频数据处理方法，从待处理的帧图像中提取特定对象的关键点信息，根据关键点信息，得到前景图像在三维场景中的深度、左右位置信息等。再根据三维场景中的地形信息，调整前景图像在三维场景中的上下位置信息，使得特定对象可以合理的与三维场景进行融合，使融合后得到的视频呈现真实的显示效果，避免出现视频中特定对象仅只是设置在三维场景中，而不考虑三维场景中具体地形信息导致的显示错误。同时，还在前景图像的特定对象的特定区域绘制效果贴图，以丰富、美化特定对象的显示效果。进一步，还可以将处理后的视频数据直接上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。本发明对用户技术水平不做限制，不需要用户手动对视频进行处理，自动实现对视频的处理，大大节省用户时间。According to the virtual world-based video data processing method provided by the present invention, the key point information of a specific object is extracted from the frame image to be processed, and the depth, left and right position information, etc. of the foreground image in the three-dimensional scene are obtained according to the key point information. According to the terrain information in the 3D scene, the upper and lower position information of the foreground image in the 3D scene is adjusted, so that the specific object can be reasonably fused with the 3D scene, so that the video obtained after fusion presents a real display effect, and avoids specific objects in the video. The object is only set in the 3D scene, without considering the display error caused by the specific terrain information in the 3D scene. At the same time, an effect map is also drawn on a specific area of a specific object in the foreground image to enrich and beautify the display effect of the specific object. Furthermore, the processed video data can also be directly uploaded to one or more cloud video platform servers, so that the cloud video platform servers can display the video data on the cloud video platform. The invention does not limit the user's technical level, does not need the user to manually process the video, and automatically realizes the video processing, which greatly saves the user's time.

图3示出了根据本发明一个实施例的基于虚拟世界的视频数据处理装置的功能框图。如图3所示，基于虚拟世界的视频数据处理装置包括如下模块：Fig. 3 shows a functional block diagram of a video data processing device based on a virtual world according to an embodiment of the present invention. As shown in Figure 3, the video data processing device based on the virtual world includes the following modules:

获取模块301，适于获取视频数据。The obtaining module 301 is adapted to obtain video data.

获取模块301获取的视频数据可以是用户本地的视频数据，获取模块301也可以获取网络的视频数据。或者获取模块301还可以获取由多个本地图片合成的视频数据，或者获取模块301获取由多个网络图片合成的视频数据，或者获取模块301获取由多个本地图片和多个网络图片合成的视频数据。The video data acquired by the acquisition module 301 may be local video data of the user, and the acquisition module 301 may also acquire video data of the network. Or the acquiring module 301 can also acquire video data synthesized from multiple local pictures, or the acquiring module 301 acquires video data synthesized from multiple network pictures, or the acquiring module 301 acquires video synthesized from multiple local pictures and multiple network pictures data.

甄别模块302，适于对视频数据进行甄别，获取包含特定对象的待处理的帧图像。The discrimination module 302 is adapted to discriminate the video data, and obtain frame images containing specific objects to be processed.

视频数据中包含很多帧图像，需要甄别模块302对视频数据进行甄别。由于本发明对特定对象进行处理，因此甄别模块302进行甄别后获取包含特定对象的待处理的帧图像。Video data contains many frames of images, and the screening module 302 is required to screen the video data. Since the present invention processes specific objects, the screening module 302 obtains frame images to be processed including specific objects after screening.

甄别模块302在甄别时，还可以根据用户指定时间段，仅对用户指定时间段内的视频数据进行甄别，而不需要对其他时间段的视频数据进行甄别。如由于视频数据的后半段为高潮时段，往往用户指定时间段为视频数据的后半段。因此甄别模块302仅对用户指定时间段的视频数据进行甄别，获取用户指定时间段的视频数据中包含特定对象的待处理的帧图像。When screening, the screening module 302 can also screen only the video data in the user-designated time period according to the user-specified time period, and does not need to screen the video data in other time periods. For example, since the second half of the video data is the climax period, the user often specifies a time period as the second half of the video data. Therefore, the screening module 302 only screens the video data of the time period specified by the user, and acquires the frame images to be processed that include the specific object in the video data of the time period specified by the user.

分割模块303，适于对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像。The segmentation module 303 is adapted to perform scene segmentation processing on the frame image to be processed to obtain a foreground image for a specific object.

待处理的帧图像包含了特定对象，如人体。分割模块303对待处理的帧图像进行场景分割处理，主要是将特定对象从待处理的帧图像中分割出来，得到待处理的帧图像针对于特定对象的前景图像，该前景图像可以仅包含特定对象。The frame image to be processed contains a specific object, such as a human body. Segmentation module 303 performs scene segmentation processing on the frame image to be processed, mainly to segment a specific object from the frame image to be processed, and obtain a foreground image of the frame image to be processed for a specific object, and the foreground image may only contain a specific object .

分割模块303在对待处理的帧图像进行场景分割处理时，可以利用深度学习方法。深度学习是机器学习中一种基于对数据进行表征学习的方法。观测值(例如一幅图像)可以使用多种方式来表示，如每个像素强度值的向量，或者更抽象地表示成一系列边、特定形状的区域等。而使用某些特定的表示方法更容易从实例中学习任务(例如，人脸识别或面部表情识别)。如分割模块303利用深度学习的人体分割方法可以对待处理的帧图像进行场景分割，得到包含人体的前景图像。进一步，分割模块303在对待处理的帧图像进行场景分割得到的包含人体的前景图像时，可以得到人体的全部图像或者仅得到人体的大部分图像，此处不做限定。The segmentation module 303 may use a deep learning method when performing scene segmentation processing on the frame image to be processed. Deep learning is a method based on representation learning of data in machine learning. Observations (such as an image) can be represented in a variety of ways, such as a vector of intensity values for each pixel, or more abstractly as a series of edges, regions of a specific shape, etc. Whereas it is easier to learn tasks from examples (e.g., face recognition or facial expression recognition) using some specific representations. For example, the segmentation module 303 can perform scene segmentation on the frame image to be processed by using the human body segmentation method of deep learning to obtain a foreground image including the human body. Further, when the segmentation module 303 performs scene segmentation on the frame image to be processed to obtain the foreground image including the human body, it may obtain all images of the human body or only obtain most images of the human body, which is not limited here.

绘制模块304，适于绘制三维场景。The rendering module 304 is suitable for rendering a three-dimensional scene.

绘制模块304绘制的三维场景可以为三维的虚拟场景，绘制模块304也可以将真实的场景三维化处理为三维场景。三维场景中可以包含如森林、瀑布、湖泊等各种对象，此外不限定三维场景的具体内容。The 3D scene drawn by the rendering module 304 may be a 3D virtual scene, and the rendering module 304 may also process a real scene into a 3D scene. The three-dimensional scene may contain various objects such as forests, waterfalls, lakes, etc., and the specific content of the three-dimensional scene is not limited.

绘制模块304绘制的三维场景中还包含了实时变化的天气信息，如阴天、晴天、下雨等不同的天气变化的场景。绘制模块304绘制的三维场景中实时变化的天气信息使得三维场景更加真实、呈现的效果更生动。绘制模块304绘制的三维场景中还包含了可变化的光照信息，如晴天时可以为太阳光照效果，下雨时可以有闪电光照效果，黑暗的场景还可以有萤火虫飞舞的光照效果(绘制模块304绘制的萤火虫飞舞时可以设置为在指定位置飞舞，也可以根据后续融合模块306在融合时，绘制为在特定对象周围飞舞等)等，以使整个三维场景更加协调。绘制模块304采用绘制三维场景的技术可以采用任一绘制技术，此处不做限定。The three-dimensional scene drawn by the drawing module 304 also includes real-time changing weather information, such as cloudy, sunny, rainy and other different weather changing scenes. The real-time changing weather information in the 3D scene drawn by the rendering module 304 makes the 3D scene more realistic and the rendering effect more vivid. The three-dimensional scene drawn by the drawing module 304 also includes variable lighting information, such as the sun lighting effect when it is sunny, the lightning lighting effect when it rains, and the lighting effect of fireflies flying in the dark scene (drawing module 304 When the drawn fireflies are flying, they can be set to fly at a specified position, or they can be drawn to fly around a specific object according to the subsequent fusion module 306 during fusion, etc.), so as to make the whole 3D scene more coordinated. The rendering module 304 may use any rendering technology to render the three-dimensional scene, which is not limited here.

提取模块305，适于从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息。The extraction module 305 is adapted to extract key information of a specific object from the frame image to be processed, and obtain position information of the foreground image in the 3D scene according to the key information.

提取模块305从待处理的帧图像中提取特定对象的关键信息，该关键信息可以具体为关键点信息、关键区域信息、和/或关键线信息等。本发明的实施例以关键点信息为例进行说明，但本发明的关键信息不限于是关键点信息。使用关键点信息可以提高根据关键点信息获取位置信息的处理速度和效率，可以直接根据关键点信息获取位置信息，不需要再对关键信息进行后续计算、分析等复杂操作。同时，关键点信息便于提取，且提取准确，获取位置信息的效果更精准。由于一般通过特定对象的边缘位置的关键点信息获取位置信息，因此，提取模块305在从待处理的帧图像中提取时，可以提取出位于特定对象边缘的关键点信息。当特定对象为人体时，提取模块305提取的关键点信息包括位于人脸边缘的关键点信息、位于人体边缘的关键点信息等。The extraction module 305 extracts key information of a specific object from the frame image to be processed, and the key information may specifically be key point information, key area information, and/or key line information. The embodiment of the present invention is described by taking the key point information as an example, but the key information in the present invention is not limited to the key point information. The use of key point information can improve the processing speed and efficiency of obtaining location information based on key point information, and can directly obtain location information based on key point information, without the need for subsequent calculations, analysis and other complex operations on key information. At the same time, the key point information is easy to extract, and the extraction is accurate, and the effect of obtaining position information is more accurate. Since the position information is generally acquired through the key point information of the edge position of the specific object, the extraction module 305 can extract the key point information at the edge of the specific object when extracting from the frame image to be processed. When the specific object is a human body, the key point information extracted by the extraction module 305 includes key point information located at the edge of the face, key point information located at the edge of the human body, and the like.

三维场景中的位置信息具体包括了三维场景中的左右、上下、深度位置信息，分别对应了三维场景中x轴、y轴、z轴方向上的各位置信息。提取模块305根据从待处理的帧图像中提取的特定对象的关键点信息，可以相应的确定前景图像在三维场景中的位置信息。The location information in the 3D scene specifically includes the left-right, top-bottom, and depth location information in the 3D scene, corresponding to the location information in the x-axis, y-axis, and z-axis directions of the 3D scene, respectively. The extraction module 305 can correspondingly determine the position information of the foreground image in the three-dimensional scene according to the key point information of the specific object extracted from the frame image to be processed.

融合模块306，适于根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像。The fusion module 306 is adapted to perform fusion processing on the foreground image and the three-dimensional scene according to the position information, to obtain the processed frame image.

融合模块306根据位置信息，将前景图像设置在三维场景中对应的位置处，使前景图像与三维场景进行融合，得到处理后的帧图像。为使融合模块306将前景图像可以更好的与三维场景融合，在分割模块303对待处理的帧图像进行分割处理时，对分割得到的前景处理的边缘进行半透明处理，模糊特定对象的边缘，以便融合模块306更好的融合。The fusion module 306 sets the foreground image at a corresponding position in the 3D scene according to the position information, and fuses the foreground image with the 3D scene to obtain a processed frame image. In order to enable the fusion module 306 to better fuse the foreground image with the three-dimensional scene, when the segmentation module 303 performs segmentation processing on the frame image to be processed, semi-transparent processing is performed on the edge of the foreground processing obtained by segmentation, and the edge of the specific object is blurred. So that the fusion module 306 can better fuse.

覆盖模块307，适于将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。The covering module 307 is adapted to cover the processed frame image with the frame image to be processed to obtain processed video data.

覆盖模块307使用处理后的帧图像直接覆盖掉对应的待处理的帧图像，直接可以得到处理后的视频数据。The overlay module 307 uses the processed frame image to directly overwrite the corresponding frame image to be processed, so that the processed video data can be obtained directly.

根据本发明提供的基于虚拟世界的视频数据处理装置，获取视频数据；对视频数据进行甄别，获取包含特定对象的待处理的帧图像；对待处理的帧图像进行场景分割处理，得到针对于特定对象的前景图像；绘制三维场景；从待处理的帧图像中提取出特定对象的关键信息，根据关键信息得到前景图像在三维场景中的位置信息；根据位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像；将处理后的帧图像覆盖待处理的帧图像得到处理后的视频数据。本发明在对视频数据进行甄别，获取包含特定对象的待处理的帧图像后，从待处理的帧图像中分割出特定对象的前景图像。根据从待处理的帧图像中提取出特定对象的关键信息，得到前景图像在三维场景中的位置信息，便于将前景图像和三维场景进行融合，得到的处理后的视频呈现出特定对象位于三维场景中的效果。本发明采用了深度学习方法，实现了高效率高精准性地完成场景分割处理。且对用户技术水平不做限制，不需要用户手动对视频进行处理，自动实现对视频的处理，大大节省用户时间。According to the video data processing device based on the virtual world provided by the present invention, the video data is obtained; the video data is screened to obtain the frame image to be processed containing a specific object; the frame image to be processed is subjected to scene segmentation processing to obtain the foreground image; draw the 3D scene; extract the key information of a specific object from the frame image to be processed, and obtain the position information of the foreground image in the 3D scene according to the key information; according to the position information, fuse the foreground image with the 3D scene , to obtain the processed frame image; the processed frame image is overlaid on the frame image to be processed to obtain the processed video data. The present invention screens the video data and acquires the to-be-processed frame image containing the specific object, and then segments the foreground image of the specific object from the to-be-processed frame image. According to the key information of the specific object extracted from the frame image to be processed, the position information of the foreground image in the 3D scene is obtained, which facilitates the fusion of the foreground image and the 3D scene, and the obtained processed video shows that the specific object is located in the 3D scene. in the effect. The present invention adopts a deep learning method to realize scene segmentation processing with high efficiency and high precision. And there is no limit to the user's technical level, and the user does not need to manually process the video, and automatically realizes the video processing, which greatly saves the user's time.

图4示出了根据本发明另一个实施例的基于虚拟世界的视频数据处理装置的功能框图。如图4所示，与图3不同之处在于，基于虚拟世界的视频数据处理装置还包括：Fig. 4 shows a functional block diagram of an apparatus for processing video data based on a virtual world according to another embodiment of the present invention. As shown in Figure 4, the difference from Figure 3 is that the video data processing device based on the virtual world also includes:

贴图模块308，适于在前景图像的特定对象的特定区域绘制效果贴图。The texture module 308 is adapted to draw an effect texture on a specific area of a specific object in the foreground image.

当分割模块303得到针对于特定对象的前景图像中，特定对象仅为一部分时，如分割模块303得到仅包含人体上半身的前景图像。此时，贴图模块308可以在前景图像的特定对象的特定区域绘制效果贴图，进行遮挡或美化。具体的，贴图模块308可以在人体上半身的下方绘制如云朵的效果贴图，以形成人体漂浮在空中的效果。效果贴图可以根据三维场景、特定对象的不同设置为不同的效果贴图，以使效果贴图与三维场景、特定对象的风格、显示效果等相呼应，呈现整体一致的效果。When the segmentation module 303 obtains only a part of the specific object in the foreground image for the specific object, for example, the segmentation module 303 obtains the foreground image containing only the upper body of the human body. At this point, the texture map module 308 can draw an effect texture on a specific area of a specific object in the foreground image for occlusion or beautification. Specifically, the texture module 308 may draw an effect texture map such as clouds under the upper body of the human body to form the effect of the human body floating in the air. Effect maps can be set as different effect maps according to different 3D scenes and specific objects, so that the effect maps can echo with the 3D scene, the style of specific objects, display effects, etc., and present an overall consistent effect.

提取模块305还包括了第一位置模块309和第二位置模块310。The extraction module 305 also includes a first location module 309 and a second location module 310 .

第一位置模块309，适于根据特定对象的关键点信息，计算具有对称关系的至少两个关键点之间的距离；根据具有对称关系的至少两个关键点之间的距离，得到前景图像在三维场景中的深度位置信息。The first position module 309 is adapted to calculate the distance between at least two key points with a symmetrical relationship according to the key point information of a specific object; according to the distance between at least two key points with a symmetrical relationship, the foreground image is obtained at Depth location information in 3D scenes.

由于特定对象与图像采集设备的距离不同，导致特定对象在待处理的帧图像中的大小也不一致。如人体与图像采集设备的距离较远时，人体在待处理的帧图像中呈现较小，人体与图像采集设备的距离较近时，人体在待处理的帧图像中呈现较大。第一位置模块309根据特定对象的关键点信息，可以计算出具有对称关系的至少两个关键点之间的距离。如第一位置模块309计算出人脸边缘两个眼角所在为止的关键点之间的距离。根据具有对称关系的至少两个关键点之间的距离，结合特定对象的实际距离，可以得出特定对象与图像采集设备的距离。第一位置模块309根据距离，可以得到前景图像在三维场景中的深度位置信息，即第一位置模块309得到前景图像与三维场景融合时，前景图像设置在三维场景中具体的深度位置信息。如第一位置模块309计算出人脸边缘两个眼角所在为止的关键点之间的距离，得到人体与图像采集设备的距离较远，由于人体在待处理的帧图像中呈现较小，分割得到的前景图像也较小，第一位置模块309得到前景图像设置在三维场景中的深度位置也较深，呈现出前景图像在三维场景中较深位置，人体在三维场景中也较小的效果。或者第一位置模块309计算出人脸边缘两个眼角所在为止的关键点之间的距离，得到人体与图像采集设备的距离较近，由于人体在待处理的帧图像中呈现较大，第一位置模块309得到前景图像设置在三维场景中的深度位置会比较靠前，呈现出前景图像在三维场景中较靠前的位置，人体在三维场景中也较大的效果。前景图像在三维场景中的深度位置信息与具有对称关系的至少两个关键点之间的距离相关。Due to the different distances between the specific object and the image acquisition device, the sizes of the specific object in the frame images to be processed are also inconsistent. For example, when the distance between the human body and the image acquisition device is relatively long, the human body appears smaller in the frame image to be processed, and when the distance between the human body and the image acquisition device is relatively short, the human body appears larger in the frame image to be processed. The first position module 309 can calculate the distance between at least two key points having a symmetrical relationship according to the key point information of the specific object. For example, the first position module 309 calculates the distance between the key points on the edge of the face until the corners of the two eyes are located. According to the distance between at least two key points having a symmetrical relationship, combined with the actual distance of the specific object, the distance between the specific object and the image acquisition device can be obtained. The first position module 309 can obtain the depth position information of the foreground image in the 3D scene according to the distance, that is, the first position module 309 obtains the specific depth position information of the foreground image set in the 3D scene when the foreground image is fused with the 3D scene. For example, the first position module 309 calculates the distance between the key points until the corners of the two eyes on the edge of the face, and obtains that the distance between the human body and the image acquisition device is relatively long. Since the human body appears relatively small in the frame image to be processed, the segmentation results in The foreground image is also smaller, and the first position module 309 obtains that the depth position of the foreground image set in the 3D scene is also deeper, showing the effect that the foreground image is in a deeper position in the 3D scene, and the human body is also smaller in the 3D scene. Or the first position module 309 calculates the distance between the key points until the two corners of the eyes on the edge of the human face, and obtains that the distance between the human body and the image acquisition device is relatively short. Since the human body appears relatively large in the frame image to be processed, the first The position module 309 obtains that the depth position of the foreground image set in the 3D scene will be relatively front, showing the effect that the foreground image is relatively front in the 3D scene, and the human body is also relatively large in the 3D scene. The depth position information of the foreground image in the three-dimensional scene is related to the distance between at least two key points having a symmetrical relationship.

第二位置模块310，适于根据关键点信息，获取特定对象在待处理的帧图像中的位置信息；根据特定对象在待处理的帧图像中的位置信息，得到前景图像在三维场景中的左右位置信息。The second position module 310 is adapted to obtain the position information of the specific object in the frame image to be processed according to the key point information; obtain the left and right positions of the foreground image in the three-dimensional scene according to the position information of the specific object in the frame image to be processed location information.

第二位置模块310通过特定对象的关键点信息进行计算，得到特定对象在待处理的帧图像中的具体的位置。此处特定对象在待处理的帧图像中的位置信息包括特定对象在待处理的帧图像中的左右位置信息、上下位置信息、特定对象旋转角度信息等位置信息。第二位置模块310根据特定对象在待处理的帧图像中的位置信息，可以得到前景图像在三维场景中的左右位置信息。其中，前景图像在三维场景中的左右位置信息与特定对象在待处理的帧图像中的左右位置信息相对应。进一步，第二位置模块310还可以根据特定对象在待处理的帧图像中的上下位置信息、旋转角度信息，设置前景图像在三维场景中的上下位置信息、旋转角度信息等。The second position module 310 calculates through the key point information of the specific object to obtain the specific position of the specific object in the frame image to be processed. Here, the position information of the specific object in the frame image to be processed includes position information such as left and right position information, up and down position information, and rotation angle information of the specific object in the frame image to be processed. The second position module 310 can obtain the left and right position information of the foreground image in the three-dimensional scene according to the position information of the specific object in the frame image to be processed. Wherein, the left and right position information of the foreground image in the three-dimensional scene corresponds to the left and right position information of the specific object in the frame image to be processed. Further, the second position module 310 may also set the up and down position information and rotation angle information of the foreground image in the 3D scene according to the up and down position information and rotation angle information of the specific object in the frame image to be processed.

第三位置模块311，适于获取三维场景的地形信息；根据三维场景的地形信息、前景图像在三维场景中的左右位置信息和/或深度位置信息，得到前景图像在三维场景中的上下位置信息。The third position module 311 is adapted to obtain terrain information of the 3D scene; according to the terrain information of the 3D scene, the left and right position information and/or the depth position information of the foreground image in the 3D scene, obtain the up and down position information of the foreground image in the 3D scene .

第三位置模块311获取三维场景的地形信息，其中，地形信息包括了如台阶、石头、湖泊等各种地形的在三维场景中的左右、上下、深度位置信息。第三位置模块311根据三维场景的地形信息，综合前景图像在三维场景中的左右位置信息、深度位置信息等，可以得到前景图像在三维场景中的上下位置信息。具体的，第三位置模块311根据前景图像在三维场景中的左右位置信息、深度位置信息，可以先确定当前左右位置信息、深度位置信息处的三维场景的地形。当该地形为台阶时，第三位置模块311根据台阶地形的上下位置信息，调整前景图像在三维场景中的上下位置信息，避免出现特定对象被设置在台阶中间的情况。或者当该地形为石头时，第三位置模块311根据石头地形的上下位置信息，调整前景图像在三维场景中的上下位置信息，避免出现特定对象被卡在石头中的情况，可以将特定对象设置在石头上或是特定对象设置在石头后方的位置。前景图像在三维场景中的上下位置信息会随着特定对象在三维场景中的左右位置信息和/或深度位置信息的不同、对应的三维场景的地形信息不同而变化。具体变化根据实施情况进行设置。The third location module 311 acquires terrain information of the 3D scene, wherein the terrain information includes left, right, up and down, and depth position information of various terrains such as steps, stones, and lakes in the 3D scene. The third position module 311 can obtain the top and bottom position information of the foreground image in the 3D scene by synthesizing the left and right position information and depth position information of the foreground image in the 3D scene according to the terrain information of the 3D scene. Specifically, the third position module 311 may first determine the topography of the 3D scene at the current left and right position information and depth position information according to the left and right position information and depth position information of the foreground image in the 3D scene. When the terrain is a step, the third position module 311 adjusts the up and down position information of the foreground image in the 3D scene according to the up and down position information of the step terrain, so as to avoid the situation that a specific object is placed in the middle of the step. Or when the terrain is a stone, the third position module 311 adjusts the up and down position information of the foreground image in the three-dimensional scene according to the up and down position information of the stone terrain, so as to avoid the situation that the specific object is stuck in the stone, and the specific object can be set On the stone or where a specific object is set behind the stone. The upper and lower position information of the foreground image in the 3D scene will vary with the difference in the left and right position information and/or depth position information of a specific object in the 3D scene, and the corresponding terrain information of the 3D scene. The specific changes are set according to the implementation situation.

融合模块306可以根据上述各模块得到的前景图像在三维场景中的深度位置信息、左右位置信息和/或上下位置信息，将前景图像与三维场景进行融合处理，得到处理后的帧图像。The fusion module 306 can perform fusion processing on the foreground image and the three-dimensional scene according to the depth position information, left-right position information and/or up-down position information of the foreground image in the three-dimensional scene obtained by the above modules to obtain a processed frame image.

上传模块312，适于将处理后的视频数据上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。The upload module 312 is adapted to upload the processed video data to one or more cloud video platform servers, so that the cloud video platform servers can display the video data on the cloud video platform.

处理后的视频数据可以保存在本地仅供用户观看，也可以由上传模块312将处理后的视频数据直接上传至一个或多个云视频平台服务器，如爱奇艺、优酷、快视频等云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。The processed video data can be stored locally only for users to watch, or the upload module 312 can directly upload the processed video data to one or more cloud video platform servers, such as iQiyi, Youku, Kuai Video and other cloud videos The platform server is used for the cloud video platform server to display video data on the cloud video platform.

根据本发明提供的基于虚拟世界的视频数据处理装置，从待处理的帧图像中提取特定对象的关键点信息，根据关键点信息，得到前景图像在三维场景中的深度、左右位置信息等。再根据三维场景中的地形信息，调整前景图像在三维场景中的上下位置信息，使得特定对象可以合理的与三维场景进行融合，使融合后得到的视频呈现真实的显示效果，避免出现视频中特定对象仅只是设置在三维场景中，而不考虑三维场景中具体地形信息导致的显示错误。同时，还在前景图像的特定对象的特定区域绘制效果贴图，以丰富、美化特定对象的显示效果。进一步，还可以将处理后的视频数据直接上传至一个或多个云视频平台服务器，以供云视频平台服务器在云视频平台进行展示视频数据。本发明对用户技术水平不做限制，不需要用户手动对视频进行处理，自动实现对视频的处理，大大节省用户时间。According to the virtual world-based video data processing device provided by the present invention, the key point information of a specific object is extracted from the frame image to be processed, and the depth, left and right position information, etc. of the foreground image in the three-dimensional scene are obtained according to the key point information. According to the terrain information in the 3D scene, the upper and lower position information of the foreground image in the 3D scene is adjusted, so that the specific object can be reasonably fused with the 3D scene, so that the video obtained after fusion presents a real display effect, and avoids specific objects in the video. The object is only set in the 3D scene, without considering the display error caused by the specific terrain information in the 3D scene. At the same time, an effect map is also drawn on a specific area of a specific object in the foreground image to enrich and beautify the display effect of the specific object. Furthermore, the processed video data can also be directly uploaded to one or more cloud video platform servers, so that the cloud video platform servers can display the video data on the cloud video platform. The invention does not limit the user's technical level, does not need the user to manually process the video, and automatically realizes the video processing, which greatly saves the user's time.

本申请还提供了一种非易失性计算机存储介质，所述计算机存储介质存储有至少一可执行指令，该计算机可执行指令可执行上述任意方法实施例中的基于虚拟世界的视频数据处理方法。The present application also provides a non-volatile computer storage medium, the computer storage medium stores at least one executable instruction, and the computer executable instruction can execute the virtual world-based video data processing method in any of the above method embodiments .

图5示出了根据本发明一个实施例的一种计算设备的结构示意图，本发明具体实施例并不对计算设备的具体实现做限定。FIG. 5 shows a schematic structural diagram of a computing device according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the computing device.

如图5所示，该计算设备可以包括：处理器(processor)502、通信接口(Communications Interface)504、存储器(memory)506、以及通信总线508。As shown in FIG. 5 , the computing device may include: a processor (processor) 502 , a communication interface (Communications Interface) 504 , a memory (memory) 506 , and a communication bus 508 .

其中：in:

处理器502、通信接口504、以及存储器506通过通信总线508完成相互间的通信。The processor 502 , the communication interface 504 , and the memory 506 communicate with each other through the communication bus 508 .

通信接口504，用于与其它设备比如客户端或其它服务器等的网元通信。The communication interface 504 is configured to communicate with network elements of other devices such as clients or other servers.

处理器502，用于执行程序510，具体可以执行上述基于虚拟世界的视频数据处理方法实施例中的相关步骤。The processor 502 is configured to execute the program 510, and specifically, may execute relevant steps in the above embodiment of the virtual world-based video data processing method.

具体地，程序510可以包括程序代码，该程序代码包括计算机操作指令。Specifically, the program 510 may include program codes including computer operation instructions.

处理器502可能是中央处理器CPU，或者是特定集成电路ASIC(ApplicationSpecific Integrated Circuit)，或者是被配置成实施本发明实施例的一个或多个集成电路。计算设备包括的一个或多个处理器，可以是同一类型的处理器，如一个或多个CPU；也可以是不同类型的处理器，如一个或多个CPU以及一个或多个ASIC。The processor 502 may be a central processing unit CPU, or an ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement the embodiments of the present invention. The one or more processors included in the computing device may be of the same type, such as one or more CPUs, or may be different types of processors, such as one or more CPUs and one or more ASICs.

存储器506，用于存放程序510。存储器506可能包含高速RAM存储器，也可能还包括非易失性存储器(non-volatile memory)，例如至少一个磁盘存储器。The memory 506 is used for storing the program 510 . The memory 506 may include a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.

程序510具体可以用于使得处理器502执行上述任意方法实施例中的基于虚拟世界的视频数据处理方法。程序510中各步骤的具体实现可以参见上述基于虚拟世界的视频数据处理实施例中的相应步骤和单元中对应的描述，在此不赘述。所属领域的技术人员可以清楚地了解到，为描述的方便和简洁，上述描述的设备和模块的具体工作过程，可以参考前述方法实施例中的对应过程描述，在此不再赘述。The program 510 may be specifically configured to enable the processor 502 to execute the virtual world-based video data processing method in any of the above method embodiments. For the specific implementation of each step in the program 510, reference may be made to the corresponding description of the corresponding steps and units in the above-mentioned virtual world-based video data processing embodiment, and details are not repeated here. Those skilled in the art can clearly understand that for the convenience and brevity of description, the specific working process of the above-described devices and modules can refer to the corresponding process description in the foregoing method embodiments, and details are not repeated here.

在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述，构造这类系统所要求的结构是显而易见的。此外，本发明也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本发明的内容，并且上面对特定语言所做的描述是为了披露本发明的最佳实施方式。The algorithms and displays presented herein are not inherently related to any particular computer, virtual system, or other device. Various generic systems can also be used with the teachings based on this. The structure required to construct such a system is apparent from the above description. Furthermore, the present invention is not specific to any particular programming language. It should be understood that various programming languages can be used to implement the content of the present invention described herein, and the above description of specific languages is for disclosing the best mode of the present invention.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.

类似地，应当理解，为了精简本公开并帮助理解各个发明方面中的一个或多个，在上面对本发明的示例性实施例的描述中，本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本发明的单独实施例。Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, in order to streamline this disclosure and to facilitate an understanding of one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or its description. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this invention.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art can understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. Modules or units or components in the embodiments may be combined into one module or unit or component, and furthermore may be divided into a plurality of sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method or method so disclosed may be used in any combination, except that at least some of such features and/or processes or units are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will understand that although some embodiments described herein include some features included in other embodiments but not others, combinations of features from different embodiments are meant to be within the scope of the invention. and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本发明的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本发明实施例的基于虚拟世界的视频数据处理的装置中的一些或者全部部件的一些或者全部功能。本发明还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本发明的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。The various component embodiments of the present invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the components in the device for processing video data based on a virtual world according to an embodiment of the present invention. Full functionality. The present invention can also be implemented as an apparatus or an apparatus program (for example, a computer program and a computer program product) for performing a part or all of the methods described herein. Such a program for realizing the present invention may be stored on a computer-readable medium, or may be in the form of one or more signals. Such a signal may be downloaded from an Internet site, or provided on a carrier signal, or provided in any other form.

应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention can be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. does not indicate any order. These words can be interpreted as names.