CN104050859A

Movatterモバイル変換

Info

Publication number: CN104050859A
Application number: CN201410193904.0A
Authority: CN
Inventors: 王元庆; 董辰辰; 李异同; 陆大伟; 马换
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2014-05-08
Filing date: 2014-05-08
Publication date: 2014-09-17

Abstract

一种可交互数字化立体沙盘系统，由立体图像生成系统和人机交互系统两大部分构成；立体图像生成系统采用Opengl等3D引擎绘制三维地形，应用人眼跟踪和指向光技术，实现无辅助立体显示，并且采用一系列图像的加速技术加快图像的渲染过程；使三维建筑模型置于三维地形目标区域中，具有精确的空间分布和良好的视觉效果；立体图像生成系统构成裸眼立体显示器；人机交互系统，指通过手势控制立体图像生成系统中三维地形场景或三维模型数据的系统，通过预置手势识别的接口实现；相对应的手势控制立体图像生成系统中三维地形场景或三维模型数据的放大、缩小、平移、旋转或进入等并且根据需要配置对应的可以区别不同控制意义的手势。

An interactive digital three-dimensional sand table system is composed of two parts: a three-dimensional image generation system and a human-computer interaction system; the three-dimensional image generation system uses Opengl and other 3D engines to draw three-dimensional terrain, and applies human eye tracking and pointing light technology to realize unassisted three-dimensional Display, and use a series of image acceleration technologies to speed up the image rendering process; place the 3D architectural model in the 3D terrain target area, with precise spatial distribution and good visual effects; the stereoscopic image generation system constitutes a naked-eye stereoscopic display; man-machine Interactive system refers to the system that controls the 3D terrain scene or 3D model data in the stereoscopic image generation system through gestures, and realizes through the interface of preset gesture recognition; the corresponding gesture controls the enlargement of the 3D terrain scene or 3D model data in the stereoscopic image generation system , zoom out, pan, rotate or enter, etc., and configure corresponding gestures that can distinguish different control meanings as required.

Description

Translated fromChinese

可交互数字化立体沙盘系统Interactive digital three-dimensional sand table system

技术领域technical field

本发明涉及裸眼方式生成自由立体场景与人机交互的技术，属于信息显示技术与人机交互领域。 The invention relates to a technology for generating a free stereoscopic scene and human-computer interaction in a naked-eye manner, and belongs to the field of information display technology and human-computer interaction. the

背景技术Background technique

沙盘是一种表达地表三维立体分布、地物目标状态等各种地理环境信息的工具，主要表现的是地形数据，使人们能从微观的角度来了解宏观的事物。沙盘的制作通常是根据地形图、航空像片或实地地形，按一定的比例关系形成的模型，具有形象直观、制作简便、经济实用等特点。 The sand table is a tool to express various geographical environment information such as the three-dimensional distribution of the surface and the target state of the ground objects. The production of sand tables is usually based on topographic maps, aerial photos or on-site terrain, and a model is formed according to a certain proportional relationship. It has the characteristics of intuitive image, simple production, economical and practical. the

传统的沙盘用泥沙、兵棋和其它材料堆制，购置速度慢，运输不便，已难以适应当代和未来信息化战场的需要，尤其是现代战争已经发展成为大区域、立体化、多兵种，战争过程瞬息万变、作战时机转瞬即逝，如何发展一种适应现代战争需要的战场态势表达技术，是现代战争的迫切需求。 The traditional sand table is made of sand, war chess and other materials. The purchase speed is slow and the transportation is inconvenient. It is difficult to adapt to the needs of the contemporary and future information battlefield. The process is changing rapidly and the opportunity for combat is fleeting. How to develop a battlefield situation expression technology that meets the needs of modern warfare is an urgent need for modern warfare. the

随着数字化、信息化技术的不断发展，可交互式的数字化沙盘运用而生。近年来越来越多的应用到规划楼盘、教育尤其是军事领域上，其人与作战环境的虚拟并自然的交互技术集成系统，很大程度上解决了真实作战和训练中的问题，如费用过高、环境限制等。越来越多的受到了各国军方的重视。但是，现在的数字化沙盘全部都是平面的，不能够准确表达三维场景的态势。 With the continuous development of digitization and information technology, the application of interactive digital sand table comes into being. In recent years, it has been more and more applied to the planning of real estate, education, and especially the military field. Its virtual and natural interactive technology integration system between people and the combat environment has largely solved the problems in real combat and training, such as cost High, environmental restrictions, etc. More and more attention has been paid by the military of various countries. However, the current digital sand tables are all flat and cannot accurately express the situation of the three-dimensional scene. the

以下是通过南大查新站对国内相似专利的对比分析 The following is a comparative analysis of domestic similar patents through the NTU Novelty Check Station

专利[1](CN200920301807。3[P]。2010-7-28)提出了一种“基于实时交互式影像技术”，包括成像模块，交互模块，基础模块。成像模块为投影机组，交互模块分为非接触式交互与接触式交互现，非接触为利用红外摄像头/摄像机设备采集影像，接触式则为采用触摸屏作为输入设备，再将影像输出模块传送到成像模块。基础模块则为适合成像模块成像的材料制成。 Patent [1] (CN200920301807.3[P]. 2010-7-28) proposes a "real-time interactive imaging technology", including imaging module, interactive module and basic module. The imaging module is a projector unit, and the interactive module is divided into non-contact interactive and contact interactive display. The non-contact is to use the infrared camera/camera equipment to collect images, and the contact type is to use the touch screen as the input device, and then the image output module is sent to the imaging module. The base module is made of materials suitable for imaging by the imaging module. the

专利[2](CN201110233828。8[P]。2013-2-20)提出了“一种交互式电子沙盘手势识别方法”，使用手势的方式与电子沙盘进行交互，通过不同的手势运动，在沙盘上画出光电运动轨迹，进而通过对光电运动轨迹采用模式识别技术，实现手势识别，最终实现手势对电子沙盘的控制，其中，光电运动轨迹的跟踪采用最小距离算法，对光点运动轨迹采用的模式识别技术为决策树算法。 Patent [2] (CN201110233828. 8 [P]. 2013-2-20) proposed "a gesture recognition method for interactive electronic sand table", using gestures to interact with the electronic sand table, through different gesture movements, in the sand table Draw the photoelectric movement trajectory on the above, and then realize gesture recognition by using pattern recognition technology on the photoelectric movement trajectory, and finally realize the control of gestures on the electronic sand table. Among them, the minimum distance algorithm is used for the tracking of the photoelectric movement trajectory, and the The pattern recognition technique is a decision tree algorithm. the

专利[3](CN201110233542。X[P]。2012-9-5)提出了一种“基于手势识别的互动沙盘系统”，其特征在于沙盘和和环幕显示控制端发送的内容，沙盘摄像头获取用户手势信息，并发送给控制端，控制端通过判断收到指令的时间，将接收到的手势信息解析为相应的控制指令，并将控制命令发送到相关被控终端。沙盘和环幕接收到控制命令，并执行该命令。互动大型投影屏幕采用大型环幕设计，采用多台投影机，结合同步投影控制技术，曲面矫正技术和边缘融合技术，实现超大环幕投影的无缝拼接和同步投影。 Patent [3] (CN201110233542. X[P]. 2012-9-5) proposes an "interactive sand table system based on gesture recognition", which is characterized in that the sand table and the ring screen display the content sent by the control terminal, and the sand table camera acquires The user gesture information is sent to the control terminal, and the control terminal parses the received gesture information into the corresponding control command by judging the time of receiving the command, and sends the control command to the relevant controlled terminal. The sand table and ring screen receive the control command and execute the command. The interactive large-scale projection screen adopts a large-scale ring screen design, adopts multiple projectors, and combines synchronous projection control technology, surface correction technology and edge fusion technology to realize seamless splicing and simultaneous projection of super large ring screen projection. the

专利[1]无手势识别内容，且交互实现以触摸屏为介质，3D实现为投影仪组投影实现；专利[2]中主要为手势识别技术，采用的算法与本发明有很大不同且并不涉及3D图像的显示；专利[3]中采取环幕设计和投影技术实现同步投影，用摄像头远距离获取手势信息等关键技术都与本发明所采用技术有何大差别，显示的图像不是立体的。 Patent [1] has no gesture recognition content, and the interactive realization uses the touch screen as the medium, and the 3D realization is realized by projector group projection; the patent [2] is mainly gesture recognition technology, and the algorithm adopted is very different from the present invention and does not It involves the display of 3D images; in the patent [3], the ring screen design and projection technology are adopted to realize synchronous projection, and the key technologies such as using the camera to obtain gesture information from a long distance are very different from the technology adopted in the present invention, and the displayed image is not three-dimensional . the

综上所述，本发明有如下特点: In summary, the present invention has following characteristics:

系统实时生成立体影像，以裸眼立体的方式显示立体场景，并可以在用户的手势操作下，不仅可以实现放大、缩小、旋转等基本操作，还可以实现各类标示、标定、量测等功能。 The system generates 3D images in real time, displays 3D scenes in a naked-eye 3D manner, and can not only realize basic operations such as zooming in, zooming out, and rotating under user gestures, but also various functions such as marking, calibration, and measurement. the

本发明包括三个方面的技术：1、人机交互，2、裸眼立体显示，3、立体图像实时生成 The present invention includes three aspects of technology: 1. Human-computer interaction, 2. Naked-eye stereoscopic display, 3. Real-time generation of stereoscopic images

在人机交互方面：作为一个独立的重要的研究领域受到了世界上极为广泛地关注。国际上现在已经研究出许多方式来实现人机交互。人们可以利用键盘鼠标、操作杆、位置追踪器、数据手套等设备控制有关设备的运行和理解并执行通过人机交互设备传来的有关的各种命令和要求。20世纪90年代后期以来随着高速处理芯片，多媒体技术和Internet Web技术的迅速发展和普及，人机交互的研究重点放在了智能化互补，多模态(多通道)多媒体交互，虚拟交互以及人机协同交互等方面，也就是放在以人为在中心的人机交互技术方面。可以说世界上在人机交互方面的技术发展才刚刚起步并且正迅速的发展。可以说世界上在人机交互已经开始逐渐被应用起来(电影、娱乐方面)。然而目前我国在人机交互方面的设计与研究与国际同类研究相比还存在较大差距，缺少新的人机技术。而利用指向光技术和人眼检测技术实现的无辅助式裸眼立体显示技术的应用尚未成熟。 In terms of human-computer interaction: as an independent and important research field, it has received extensive attention in the world. Internationally, many ways have been researched to realize human-computer interaction. People can use keyboard and mouse, joystick, position tracker, data gloves and other equipment to control the operation of related equipment and understand and execute various commands and requirements transmitted through human-computer interaction equipment. Since the late 1990s, with the rapid development and popularization of high-speed processing chips, multimedia technology and Internet Web technology, research on human-computer interaction has focused on intelligent complementarity, multi-modal (multi-channel) multimedia interaction, virtual interaction and Human-computer collaborative interaction and other aspects, that is, human-centered human-computer interaction technology. It can be said that the technological development of human-computer interaction in the world has just started and is developing rapidly. It can be said that human-computer interaction in the world has begun to be gradually applied (movies, entertainment). However, there is still a large gap between the design and research of human-computer interaction in my country and the similar international research, and there is a lack of new human-computer technology. However, the application of unassisted naked-eye stereoscopic display technology realized by pointing light technology and human eye detection technology is not yet mature. the

在裸眼3D显示方面：可以说目前的3D技术已经发展的较成熟。市面上也越来越多的能见到3D产品。应用最广泛的就是影视方面。然而虽然立体图像对技术能够提供立体感，但它本质上只是空间中两张或多张平面图像，通过视差而立体成像。但这类技术不仅需借助偏振光片等辅助工具降低了人们体验3D的舒适度，而且使用操作具有极大的局限性，此外这类技术也会使人眼产生矛盾的晶状体焦距调节和实现汇聚调节，长时间观看会产生视觉疲劳。 In terms of naked-eye 3D display: It can be said that the current 3D technology has developed relatively maturely. There are more and more 3D products on the market. The most widely used is film and television. However, although the stereoscopic image pair technology can provide a stereoscopic effect, it is essentially just two or more plane images in space, which are stereoscopically imaged through parallax. However, this kind of technology not only reduces the comfort of people to experience 3D with the help of auxiliary tools such as polarizers, but also has great limitations in use and operation. In addition, this kind of technology will also cause the human eye to produce contradictory lens focal length adjustment and achieve convergence. Adjustment, long-term viewing will produce visual fatigue. the

发明内容Contents of the invention

本发明目的是，提出一种可交互数字化立体沙盘系统，利用裸眼3D技术和触摸技术实现高仿真立体画面和高精度人机互动，大屏(如55寸)的水平显示台产生3D效果，人眼看到的沙盘模型悬浮在屏幕上方。显示台的边框内置手势识别器，通过对手势的识别，使用户可直接触摸自己看到的3D图像实现互动，如旋转，平移，缩放，场景漫游等，并且通过放大楼盘可直接观察到房屋内部细节，让操控者仿佛置身于现实场景。 The purpose of the present invention is to propose an interactive digital three-dimensional sand table system, which uses naked-eye 3D technology and touch technology to realize high-simulation three-dimensional pictures and high-precision human-computer interaction, and the horizontal display platform with a large screen (such as 55 inches) produces a 3D effect. The sand table model seen by the naked eye is suspended above the screen. The gesture recognizer is built into the frame of the display table. Through the recognition of gestures, users can directly touch the 3D images they see to achieve interaction, such as rotation, panning, zooming, scene roaming, etc., and can directly observe the interior of the house by zooming in on the real estate. The details make the operator feel as if he is in a real scene. the

本发明的技术方案是，一种可交互数字化立体沙盘系统，主要由立体图像生成系统和人机交互系统两大部分构成； The technical solution of the present invention is an interactive digital three-dimensional sand table system, which is mainly composed of two parts: a three-dimensional image generation system and a human-computer interaction system;

1)、立体图像生成系统 1) Stereoscopic image generation system

立体图像生成系统采用Opengl，Opencv，Directx或3D引擎绘制三维地形，应用人眼跟踪和指向光技术，实现无辅助立体显示，并且采用一系列图像的加速技术加快图像的渲染过程；使三维建筑模型置于三维地形目标区域中，具有精确的空间分布和良好的视觉效果；立体图像生成系统构成裸眼立体显示器； The stereoscopic image generation system uses Opengl, Opencv, Directx or 3D engine to draw 3D terrain, applies human eye tracking and pointing light technology to realize unassisted stereoscopic display, and uses a series of image acceleration technologies to speed up the image rendering process; makes the 3D architectural model Placed in the three-dimensional terrain target area, it has precise spatial distribution and good visual effects; the stereoscopic image generation system constitutes a naked-eye stereoscopic display;

立体图像生成系统采用独自编程，沙盘模型细节众多。不仅可以显示整个地形地貌，并且可以显示战争中用到的军事模型包括坦克、飞机等，并且有绚丽的战争光影效果，预置手势识别的接口通过手势控制立体图像生成系统中三维地形场景或三维模型数据，因为地形和军事模型非常的复杂，所以程序的数据量非常大。对于这样的数据量，系统采用创新的数据管理方式：即统一的三维地形场景，分块的三维军事模型数据，实现三维军事模型分层调入，使系统既是一个有机的整体，又能快速运行，解决海量数据和运行效率的问题。 The three-dimensional image generation system adopts independent programming, and the sand table model has many details. Not only can it display the entire terrain, but it can also display the military models used in the war, including tanks, aircraft, etc., and has gorgeous war light and shadow effects. The interface for preset gesture recognition controls the 3D terrain scene or 3D scene in the stereoscopic image generation system through gestures. Model data, because the terrain and military models are very complex, so the data volume of the program is very large. For such a large amount of data, the system adopts an innovative data management method: that is, a unified 3D terrain scene, 3D military model data divided into blocks, and the 3D military model is transferred in layers, so that the system is not only an organic whole, but also can run quickly , to solve the problems of massive data and operational efficiency. the

立体图像生成系统中立体图像的获得是利用左右眼看到的图像有细微的差别，从而在大脑中合成一幅具有深度的立体图。用虚拟摄像机拍摄模拟场景，通过坐标变换获得左右图像，对获得的左右图像进行视差控制，基于视差机制，利用Opengl，Opencv，Directx或3D引擎绘制三维地形(进行渲染加速)。因为在视差间距过大的情况下无法在大脑中形成3D图像，所以对获得的左右图像进行视差控制(控制左右图像同名点的间距)；当由于人眼的移动会造成图像汇聚点的移动使立体图像产生畸变，这样就需要采用视觉跟踪的方法使左右图像的汇聚点保持不变，从而实现交互立体图像的生成。 Stereoscopic images in the stereoscopic image generation system are obtained by utilizing the subtle differences in the images seen by the left and right eyes, so that a stereoscopic image with depth is synthesized in the brain. Shoot the simulated scene with a virtual camera, obtain the left and right images through coordinate transformation, and perform parallax control on the obtained left and right images. Based on the parallax mechanism, use Opengl, Opencv, Directx or 3D engine to draw 3D terrain (rendering acceleration). Because the 3D image cannot be formed in the brain when the parallax distance is too large, the parallax control is performed on the obtained left and right images (controlling the distance between the points of the same name in the left and right images); The stereoscopic image is distorted, so it is necessary to adopt the method of visual tracking to keep the convergence point of the left and right images unchanged, so as to realize the generation of interactive stereoscopic images. the

2)、人机交互系统 2), human-computer interaction system

指通过手势控制立体图像生成系统中三维地形场景或三维模型数据的系统，通过预置手势识别的接口实现；可交互数字化立体沙盘系统操作灵活，互操作能力很强。相对应的手势控制立体图像生成系统中三维地形场景或三维模型数据的放大、缩小、平移、旋转或进入等并且根据需要配置对应的可以区别不同控制意义的手势；由立体相机拍摄并取得手势三维信息，利用中点补偿算法对原始采样点进行处理，建立隐性马尔科夫模型，将手势空间分布特征，与样本库进行相似度对比，识别手势具体含义。 Refers to the system that controls the 3D terrain scene or 3D model data in the 3D image generation system through gestures, and realizes it through the interface of preset gesture recognition; the interactive digital 3D sand table system is flexible in operation and has strong interoperability. The corresponding gesture controls the magnification, reduction, translation, rotation or entry of the 3D terrain scene or 3D model data in the stereoscopic image generation system, and configures the corresponding gestures that can distinguish different control meanings according to the needs; the 3D gestures are captured by the stereo camera and obtained. Information, use the midpoint compensation algorithm to process the original sampling points, establish a hidden Markov model, compare the spatial distribution characteristics of gestures with the sample library, and identify the specific meaning of gestures. the

还包括通过自然的手势控制，可以在三维场景中前进、后退，升高、降低视点、进入三维地形场景或三维模型数据观察内部细节(如楼盘内部观察房间内部的细节，并可以在房间中看到周围的建筑和风景，给人一种极致的人机交互体验)。图11显示了可以实现的手势操作。无侵扰手势识别装置、且实时立体内容生成构成实时的人机立体图像(影像)的生成交互； It also includes through natural gesture control, you can move forward and backward in the 3D scene, raise and lower the viewpoint, enter the 3D terrain scene or 3D model data to observe the internal details (such as observing the details inside the room inside the real estate, and you can see it in the room to the surrounding buildings and scenery, giving an ultimate human-computer interaction experience). Figure 11 shows the possible gesture operations. No intrusive gesture recognition device, and real-time three-dimensional content generation constitutes real-time human-machine three-dimensional image (image) generation interaction;

裸眼立体显示器放置无侵扰手势识别装置，通过对手势的识别，实现3D图像多人实时互动。利用裸眼3D技术和无侵扰手势识别技术实现高仿真立体画面和高精度人机互动，人眼看到的沙盘模型悬浮在裸眼立体显示器上方。 A non-intrusive gesture recognition device is placed on the naked-eye stereoscopic display, and through gesture recognition, real-time interaction of multiple people in 3D images is realized. Using naked-eye 3D technology and non-intrusive gesture recognition technology to achieve high-simulation three-dimensional images and high-precision human-computer interaction, the sand table model seen by human eyes is suspended above the naked-eye three-dimensional display. the

所述的人机交互,根据实时记录大视角、大动态范围的用户活动区域，利用帧间差分的识别方法，达到手势对三维场景的实时控制。 In the human-computer interaction, real-time control of three-dimensional scenes by gestures is achieved by using the recognition method of frame difference based on the real-time recording of the user's activity area with a large viewing angle and a large dynamic range. the

采用人眼跟踪和指向光技术，人眼位置的探测为非接触式，用户无需佩戴任何辅助装置。 Using human eye tracking and pointing light technology, the detection of human eye position is non-contact, and users do not need to wear any auxiliary devices. the

拥有手势探测的基本摄像模块，基于视觉感知的手势运动轨迹的描述，拥有手势的理解与识别算法，实现规定手势的自动识别。近场手势交互，多个手势识别的摄像头进行手势的捕捉，可实现无侵扰手势识别，手势定位的精度达到厘米级别。 It has a basic camera module for gesture detection, a description of gesture trajectory based on visual perception, and a gesture understanding and recognition algorithm to realize automatic recognition of specified gestures. Near-field gesture interaction, multiple gesture recognition cameras capture gestures, which can realize non-intrusive gesture recognition, and the accuracy of gesture positioning can reach centimeter level. the

本系统采用的是近场手势交互技术，结合计算机视觉检测技术检测手指的位置，引导立体图像生成模块生成对应的立体像对，以实现可操控，可交互的技术要求。拥有手势探测的基本摄像模块，实时记录大视角、大动态范围的用户活动区域；拥有基于视觉感知的手势运动轨迹的描述，从而形成关于手势基本点云描述；拥有手势的理解与识别算法，实现规定手势的自动识别。 This system uses near-field gesture interaction technology, combined with computer vision detection technology to detect the position of the finger, and guides the stereo image generation module to generate the corresponding stereo image pair, so as to achieve the technical requirements of controllability and interaction. With a basic camera module for gesture detection, it can record the user's activity area with a large viewing angle and a large dynamic range in real time; it has a description of the gesture trajectory based on visual perception, thereby forming a basic point cloud description of the gesture; it has a gesture understanding and recognition algorithm to realize Specifies automatic recognition of gestures. the

根据用户的实时动态，手势的变换控制，裸眼立体显示器实时更新场景的显示。 According to the user's real-time dynamics and gesture transformation control, the naked-eye stereoscopic display updates the display of the scene in real time. the

本发明的有益效果是：由于是近场手势识别，所以手势识别的精度很高，对于手指和手握物体能够进行区分，并分别进行处理。手势定位的精度达到厘米级别，另外由于单个手势识别器的识别范围有限，所以需要多个手势识别的摄像头进行手势的捕捉，并对不同区域的手势进行不同的分割处理，并采用帧间差分的方法提取出手势的运动矢量，采用隐马尔科夫模型进行处理。并且通过实时算法，使得图像根据手势进行实时变化。 The beneficial effects of the present invention are: because of the near-field gesture recognition, the gesture recognition has high precision, and the finger and the hand-held object can be distinguished and processed separately. The accuracy of gesture positioning reaches the centimeter level. In addition, due to the limited recognition range of a single gesture recognizer, multiple gesture recognition cameras are required to capture gestures, and perform different segmentation processing on gestures in different regions, and use frame difference The method extracts the motion vector of the gesture, and uses the Hidden Markov Model to process it. And through the real-time algorithm, the image changes in real time according to the gesture. the

附图说明Description of drawings

图1世界坐标系到相机坐标系、相机坐标系到图像坐标系、图像坐标系到显示坐标系的变换的具体流程图。 Fig. 1 is a specific flowchart of transformation from the world coordinate system to the camera coordinate system, from the camera coordinate system to the image coordinate system, and from the image coordinate system to the display coordinate system. the

图2Opengl根据高度信息生成的3D高度图。 Figure 2 The 3D height map generated by Opengl based on height information. the

图3本发明立体显示器人机交互的相互关系。 Fig. 3 is the interrelationship of the human-computer interaction of the stereoscopic display of the present invention. the

图4汇聚式相机模型示意图。 Figure 4. Schematic diagram of the converging camera model. the

图5平行式相机模型示意图。 Figure 5. Schematic diagram of a parallel camera model. the

图6观看模型示意图。 Figure 6 shows a schematic view of the model. the

图7图像畸变示意图。 Figure 7 Schematic diagram of image distortion. the

图8中A-D四种立体图像畸变示意图。 Schematic diagrams of four stereoscopic image distortions A-D in FIG. 8 . the

图9中A、B分别是图像畸变模型与畸变校正处理。 A and B in Fig. 9 are image distortion model and distortion correction processing respectively. the

图10中A-D四种手势检测举例。 Figure 10 A-D four gesture detection examples. the

图11手势动作说明。 Figure 11 Gesture action description. the

图中，PC终端1、视频接口2、视差照明3、3D裸眼显示屏4、虚拟物体5、人眼跟踪器6、手势探测器7。 In the figure, PC terminal 1, video interface 2, parallax lighting 3, 3D naked-eye display screen 4, virtual object 5, human eye tracker 6, gesture detector 7. the

具体实施方式Detailed ways

1)、手势定位与理解 1), Gesture positioning and understanding

利用基于肤色检测的方法提取用户的手所在的大致区域，采用帧间差分的方法提取手势的运动矢量；通过连续视频帧的相邻两帧图像的灰度值的差分信息得到手的运动区域，利用立体相机的图像匹配分析手指所处的三维坐标参数。通过上述几项手段，最终确定手势的三维点云数据。 Use the method based on skin color detection to extract the general area where the user's hand is located, and use the method of inter-frame difference to extract the motion vector of the gesture; obtain the hand motion area through the difference information of the gray value of two adjacent frames of continuous video frames, Using the image matching of the stereo camera to analyze the three-dimensional coordinate parameters of the finger. Through the above methods, the 3D point cloud data of the gesture is finally determined. the

对于手势的理解主要包括对于重采样的手势进行特征提取，以及建立隐性马尔可夫模型(Hidden Markov Model，HMM)一种统计分析模型)采样点进行分析。手势识别模块的总体目标是构建一个健壮的分类器，以对手绘手势进行分类和识别在对手势的重采样，利用中点补偿算法对原始采用点进行处理。采样点能较好反应曲率特征的变化又能够有效控制点集的规模，再建立一个高效的HMM隐马尔可夫模型，保证观测序列具有一定的规律性，基于方向编码来较好地满足HMM建模的需求。 The understanding of gestures mainly includes feature extraction for resampled gestures, and the establishment of Hidden Markov Model (Hidden Markov Model, HMM) a statistical analysis model) sampling points for analysis. The overall goal of the Gesture Recognition Module is to build a robust classifier for classifying and recognizing hand-drawn gestures. Resampling the gestures, using the midpoint compensation algorithm for the original adoption points. Sampling points can better reflect changes in curvature characteristics and can effectively control the scale of point sets, and then establish an efficient HMM hidden Markov model to ensure that the observation sequence has a certain regularity, and better satisfy the HMM construction based on direction coding. model needs. the

2)、手势交互技术的实现 2) Realization of gesture interaction technology

手势交互的实现分为手势跟踪与手势识别两大主体部分： The realization of gesture interaction is divided into two main parts: gesture tracking and gesture recognition:

手势的跟踪由静态手势识别部分，手势图像与模型的匹配部分及跟踪部分组成。静态手势识别部分实现了对当前帧中手势姿态的理解；手势图像与模型的匹配利用第一部分即上述1)的识别结果，包含手势的图像需要与2D模型进行匹配得到跟踪所需要的特征矢量和初始参量；跟踪部分则先进行手区域的粗定位，再确定手指指尖位置的变化，以及利用手的结构特征定位手掌位置。 Gesture tracking consists of a static gesture recognition part, a gesture image and model matching part, and a tracking part. The static gesture recognition part realizes the understanding of the gesture posture in the current frame; the matching of the gesture image and the model uses the recognition result of the first part (1) above, and the image containing the gesture needs to be matched with the 2D model to obtain the feature vector and Initial parameters; the tracking part first performs rough positioning of the hand area, then determines the changes in the position of the fingertips, and uses the structural characteristics of the hand to locate the palm position. the

无侵扰手势识别技术是基于手势空间分布特征的手势识别算法，手势空间分布特征(HDF)是对人手空间特征的抽象描述。其中最重要的就是提取肤色和手势空间的特征向量，并将提取的手势空间分布特征与样本库进行相似度比较，识别手势的具体含义。提取空间分布特征一般从整体姿态和局部姿态2个方面进行。将手势的整体表现特征与手势的关节变化特性结合起来提取手势的空间分布特征，不仅能够识别区分度较小的手势，还能区分一定弯曲变形的手势。图11为建立隐马尔科夫模型后的手势动作识别库举例，是对不同手势动作的识别，也包括手中刚体的识别。通过手势探测器提取手势的特征给出手势的命令的意思并通过PC接口，由PC对相应的3D图案：放大、缩小、平移、旋转或进入。 The non-intrusive gesture recognition technology is a gesture recognition algorithm based on the gesture spatial distribution feature, and the gesture spatial distribution feature (HDF) is an abstract description of the spatial characteristics of human hands. The most important thing is to extract the feature vector of skin color and gesture space, and compare the extracted gesture space distribution features with the sample library to identify the specific meaning of the gesture. The extraction of spatial distribution features is generally carried out from two aspects: the overall pose and the local pose. Combining the overall performance characteristics of the gesture with the joint change characteristics of the gesture to extract the spatial distribution characteristics of the gesture can not only recognize gestures with a small degree of discrimination, but also distinguish gestures with certain bending deformation. Figure 11 is an example of the gesture recognition library after the hidden Markov model is established, which is the recognition of different gestures, including the recognition of the rigid body in the hand. The feature of the gesture is extracted by the gesture detector to give the meaning of the gesture command, and through the PC interface, the PC can zoom in, zoom out, translate, rotate or enter the corresponding 3D pattern. the

3)、立体图像生成 3) Stereo image generation

立体图像生成基于双视点立体图像的基本原理，实时、快速地生成可交互的立体像对，并按照立体显示终端的要求输出制定的立体图像格式。在OpenGL应用程序接口中，专门提供了渲染多视角的函数，来正确获取左右视差图像。即通过PC终端1、视频接口2、视差照明3器件在3D裸眼显示屏4上显示，虚拟物体5在屏上，人眼跟踪器6用于对人眼跟踪。 Stereoscopic image generation is based on the basic principle of dual-viewpoint stereoscopic images, real-time and fast generation of interactive stereoscopic image pairs, and output stereoscopic image formats according to the requirements of stereoscopic display terminals. In the OpenGL application program interface, a function for rendering multi-view angles is specially provided to correctly obtain left and right parallax images. That is, the PC terminal 1, video interface 2, and parallax lighting 3 devices are used to display on the 3D naked-eye display screen 4, the virtual object 5 is on the screen, and the human eye tracker 6 is used to track human eyes. the

首先构建虚拟场景，设置左右虚拟相机模拟观看者的双眼捕获左右图像，通过视频接口2和视差照明3提供给显示终端。通过两架虚拟相机模拟双眼获取立体图像对，有两种立体相机模型，汇聚式立体相机模型(图4)和平行式立体相机模型(图5)。汇聚式立体相机模型两架相机的光轴相交于一点，适用于近景的拍摄；平行式立体相机模型光轴互相平行，相当于光轴相交于无穷远处，该模型适用于远景的拍摄。建立数学模型，以两架相机连线中点为原点建立世界坐标系，以两架相机为原点建立相机坐标系，以相机CCD投影面中心为原点建立图像坐标系，以显示器中心为原点建立显示坐标系，立体相机拍摄的过程可以提炼为数学模型上的世界坐标系到相机坐标系、相机坐标系到图像坐标系、图像坐标系到显示坐标系的变换，具体流程图1所示。 First construct a virtual scene, set up left and right virtual cameras to simulate the viewer's eyes to capture left and right images, and provide them to the display terminal through the video interface 2 and parallax lighting 3 . Two virtual cameras are used to simulate binocular acquisition of stereoscopic image pairs. There are two stereoscopic camera models, a converging stereo camera model (Fig. 4) and a parallel stereo camera model (Fig. 5). The optical axes of the two cameras intersect at one point in the converging stereo camera model, which is suitable for close-up shooting; the optical axes of the parallel stereo camera model are parallel to each other, which means that the optical axes intersect at infinity, and this model is suitable for distant shooting. Establish a mathematical model, establish a world coordinate system with the midpoint of the line connecting the two cameras as the origin, establish a camera coordinate system with the two cameras as the origin, establish an image coordinate system with the center of the camera CCD projection surface as the origin, and establish a display with the center of the display as the origin Coordinate system, the process of stereo camera shooting can be refined as the transformation from the world coordinate system on the mathematical model to the camera coordinate system, from the camera coordinate system to the image coordinate system, and from the image coordinate system to the display coordinate system, as shown in flowchart 1. the

通常为解决立体图像的重影和视觉疲劳问题，经常用一个比实际瞳孔距离小的间距作为立体图像生成的参数来解决这个问题。然而这种做法会引入立体图像畸变的问题，减小感知到的立体图像的深度，尤其是在人眼跟踪的立体交互图像生成过程中这种畸变效果更加明显。如图7所示，黑色的水平线代表零视差面，底下的黑点代表实际的物体所在位置，红色的点代表用户感知的物体位置，零视差面上方深蓝色的点代表实际人眼的间距，而浅蓝色的点代表设定的人眼间距，设定的人眼间距比实际值要小，当用户由左边位置移动到右边时可以看到用户感知的事物位置和实际位置发生了偏移，而且前后感知到的位置也不在同一点。这种畸变在用户观察立体图像时产生的效果如图8所示，图8中黑色的网格代表实际的物体，红色的网格代表用户感知的立体图像。图中展示了四种不同情况下的畸变情况。而且这种畸变在设定瞳孔间距与实际人眼间距差异愈大畸变愈严重。 Usually, in order to solve the ghosting and visual fatigue problems of stereoscopic images, a distance smaller than the actual pupil distance is often used as a parameter for stereoscopic image generation to solve this problem. However, this approach will introduce the problem of stereoscopic image distortion and reduce the perceived depth of the stereoscopic image, especially in the process of generating stereoscopic interactive images that are tracked by human eyes. As shown in Figure 7, the black horizontal line represents the zero-parallax plane, the black dot at the bottom represents the actual location of the object, the red dot represents the user-perceived object position, and the dark blue dot above the zero-parallax plane represents the actual distance between human eyes. The light blue dot represents the set human eye distance, the set human eye distance is smaller than the actual value, when the user moves from the left position to the right, you can see that the position of the object perceived by the user is offset from the actual position , and the positions perceived before and after are not at the same point. The effect of this distortion when the user observes the stereoscopic image is shown in FIG. 8 . In FIG. 8 , the black grid represents the actual object, and the red grid represents the stereoscopic image perceived by the user. The figure shows the distortion in four different cases. Moreover, the greater the difference between the set interpupillary distance and the actual human eye distance, the more serious the distortion will be. the

为消除畸变，可以建立一个畸变模型(如图9左)，以零视差平面中心为原点，垂直于该面为Z轴，水平方向为X轴。在该图中实际人眼连线的向量为2D，计算机设定的人眼连线向量为rD，双眼中心坐标为I，实际物体位置为E点，用户感知的物体位置为F点。可以定义一个转换矩阵，通过该转换矩阵可以在世界坐标系中将E点转换到F点，即F＝Δ(E)。通过在坐标系中的转换可以有效地减小畸变问题，如(图9右)所示，红色的网格为畸变图像，蓝色的网格为进行逆畸变Δ^-1处理后的图像。可以看出进行逆畸变处理可以有效的还原真实的场景。 To eliminate distortion, a distortion model can be established (as shown on the left in Figure 9), with the center of the zero parallax plane as the origin, the Z axis perpendicular to the plane, and the X axis horizontally. In this figure, the vector of the actual human eye connection is 2D, the human eye connection vector set by the computer is rD, the center coordinate of the eyes is I, the actual object position is point E, and the object position perceived by the user is point F. A transformation matrix can be defined by which point E can be transformed into point F in the world coordinate system, ie F=Δ(E). The distortion problem can be effectively reduced by conversion in the coordinate system, as shown in (right of Figure 9), the red grid is the distorted image, and the blue grid is the image after inverse distortion Δ^-1 processing. It can be seen that inverse distortion processing can effectively restore the real scene.

4)、基于视差机制的图像加速方法 4), image acceleration method based on parallax mechanism

在虚拟场景中，根据一系列点的三维信息，利用现有程序OPENGL绘制三角形渲染出整个场景，再加以纹理贴图就可以完成地图渲染，我们可以在初始化过程中将所有点的信息存入GPU中这样在渲染的过程中就无需频繁的读取内存中的数据，加以渲染过程。在渲染之前对地形各个快进行裁剪测试，这样我们在绘制地图中较远的场景时就可以使用较少的三角形进行渲染，对于不在视野范围内的场景不进行渲染，这样就可以加快图像的渲染过程。 In the virtual scene, according to the 3D information of a series of points, use the existing program OPENGL to draw triangles to render the entire scene, and then add texture maps to complete the map rendering. We can store the information of all points in the GPU during the initialization process In this way, during the rendering process, there is no need to frequently read the data in the memory for the rendering process. Before rendering, perform a clipping test on each block of the terrain, so that we can use fewer triangles for rendering when drawing scenes that are far away in the map, and do not render scenes that are not within the field of view, so that the rendering of the image can be accelerated process. the

实施例1： Example 1:

实际装置中，我们采用的手势识别装置是Leap motion。Leap motion传感器根据内置的两个摄像头从不同的角度捕捉的画面，重建出手掌在真是世界三维空间的联动信息。检测的范围大体在传感器上方25毫米到600毫米之间，检测的空间大体是一个到四棱锥体。 In the actual device, the gesture recognition device we use is Leap motion. The Leap motion sensor reconstructs the linkage information of the three-dimensional space of the palm in the real world based on the pictures captured by the two built-in cameras from different angles. The detection range is generally between 25 mm and 600 mm above the sensor, and the detection space is generally a square pyramid. the

首先，Leap Motion传感器会建立一个直角坐标系，坐标的原点式传感器的中心，坐标的X轴平行于传感器，指向屏幕右方。Y轴指向上方。Z轴之相背离屏幕的方向。单位为真实世界的毫米。在使用过程中，Leap Motion传感器会定期的发送关于手的联动信息，每份这样的信息称为「帧」。每一个这样的帧包含检测到的： First, the Leap Motion sensor will establish a Cartesian coordinate system, the origin of the coordinates is the center of the sensor, and the X-axis of the coordinates is parallel to the sensor, pointing to the right of the screen. The Y axis points up. The direction of the Z axis away from the screen. Units are real world millimeters. During use, the Leap Motion sensor will periodically send linkage information about the hand, and each piece of information is called a "frame". Each such frame contains detected:

·所有手掌的列表及信息； · List and information of all palms;

·所有手指的列表及信息； · List and information of all fingers;

·手持工具(细的，笔直的，比手指长的东西，例如一枝笔)的列表及信息； List and information of hand tools (thin, straight, longer than a finger, such as a pen);

·所有可指向对象(Pointable Object)，即所有手指和工具的列表及信息； All pointable objects (Pointable Object), that is, the list and information of all fingers and tools;

Leap传感器会给所有这些分配一个唯一标识(ID)，在手掌、手指、工具保持在视野范围内时，是不会改变的。根据这些ID，可以通过Frame::hand()，Frame::finger()等函数来查询每个连动对象的信息。 The Leap sensor assigns a unique identifier (ID) to all of these, which does not change while the palm, finger, or tool remains in view. According to these IDs, the information of each linked object can be queried through functions such as Frame::hand() and Frame::finger(). the

然后根据每帧和前帧检测到的数据，生成运动信息。例如，若检测到两只手，并且两只手都超一个方向移动，就认为是平移；若是像握着球一样转动，则记为旋转。若两只手靠近或分开，则记为缩放。所生成的数据包含： Then according to the detected data of each frame and the previous frame, motion information is generated. For example, if two hands are detected and both hands are moving in one direction, it is considered translation; if it is turned like holding a ball, it is recorded as rotation. If the two hands are close or apart, it is recorded as scaling. The generated data contains:

·旋转的轴向向量； Axial vector of rotation;

·旋转的角度(顺时针为正)； The angle of rotation (clockwise is positive);

·描述旋转的矩阵； The matrix describing the rotation;

·缩放因子； Scaling factor;

·平移向量； · translation vector;

实施例2： Example 2:

实际成品如图3所示。该产品由55寸3D液晶显示屏，三维手势精确定位元件，手势协处理器，电子计算机集成而成，可配置底座，采用集成化无缝设计。只有显示屏暴露在外，三维手势精确定位元件和手势协处理器元件(手势探测器)暗藏在显示屏的周边。 The actual finished product is shown in Figure 3. The product is composed of 55-inch 3D LCD screen, three-dimensional gesture precise positioning components, gesture co-processor, electronic computer, configurable base, and integrated seamless design. Only the display screen is exposed, and the three-dimensional gesture precise positioning element and the gesture co-processor element (gesture detector) are hidden in the periphery of the display screen. the

如图3所示，用户在显示器前方左右前后移动时，人眼跟踪装置将用户位置实时监测到；与此同时，收拾探测其探测用户双手的操作姿态。用户的眼睛位置，手势动作信息被实时传输给PC机，PC机根据用户观看点、手部动作而实时生成相对应的一对立体图像并显示在显示屏上。由于对人眼位置、手势动作的探测是接触式的，用头部、手部无需佩戴任何辅助装置，从而实现无侵扰人机交互。 As shown in Figure 3, when the user moves back and forth in front of the display, the human eye tracking device monitors the user's position in real time; at the same time, it detects the operation posture of the user's hands. The user's eye position and gesture information are transmitted to the PC in real time, and the PC generates a pair of corresponding stereoscopic images in real time according to the user's viewing point and hand movements and displays them on the display screen. Since the detection of human eye positions and gestures is contact-based, there is no need to wear any auxiliary devices with the head and hands, thereby achieving non-intrusive human-computer interaction. the

虽然本发明已以较佳实施例揭露如上，然其并非用以限定本发明。本发明所属技术领域中具有通常知识者，在不脱离本发明的精神和范围内，当可作各种的更动与润饰。因此，本发明的保护范围当视权利要求书所界定者为准。 Although the present invention has been disclosed above with preferred embodiments, it is not intended to limit the present invention. Those skilled in the art of the present invention can make various changes and modifications without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention should be defined by the claims. the