Disclosure of Invention
The invention provides a virtual remote experience system and a virtual remote experience method for tourism, wherein the system comprises the following steps:
the unmanned aerial vehicle is provided with a binocular camera and is used for acquiring panoramic video frames in real time;
the sight tracking module is used for acquiring the sight direction of the user by utilizing the trained first deep neural network model;
the flight control module controls the flight direction of the unmanned aerial vehicle by utilizing the traced sight direction, so as to realize remote flight control of the unmanned aerial vehicle;
the server processing module is used for processing the panoramic video frames collected by the binocular camera and sending the panoramic video frames to a naked eye 3D display screen;
the naked eye 3D display screen is used for displaying information and providing immersive flight experience for users.
Optionally, the unmanned aerial vehicle and the server processing module perform high-speed communication through the 5G communication module, and send panoramic video frame data acquired by the binocular camera to the server processing module.
Optionally, the experience system further includes: and the 3D sound box module is used for playing audio information captured in the flight process of the unmanned aerial vehicle.
Optionally, the control module controls the flying direction and speed of the unmanned aerial vehicle through the sight line direction and the sight line concentration ratio; the higher the concentration of the sight line is, the faster the advancing direction is; the flight direction is adjusted along with the sight line direction.
Optionally, the system further comprises a console module, and the unmanned aerial vehicle is remotely controlled through each operating button; and/or the voice control module is used for realizing the remote control of the unmanned aerial vehicle through a voice command; and/or the gesture control module is used for realizing the remote control of the unmanned aerial vehicle through the recognized gesture.
Correspondingly, the invention also provides a virtual tourism remote experience method, which is characterized by comprising the following steps:
acquiring a panoramic video frame in real time by using an unmanned aerial vehicle, wherein the unmanned aerial vehicle is provided with a binocular camera;
obtaining a sight direction of a user by using a sight tracking module, wherein the sight direction is obtained by using a trained first deep neural network model;
the flight control module is used for realizing remote flight control of the unmanned aerial vehicle, and the flight control module controls the flight direction of the unmanned aerial vehicle by using the traced sight direction;
processing the panoramic video frames collected by the binocular camera by using a server processing module, and sending the panoramic video frames to a naked eye 3D display screen;
and information is displayed by utilizing a naked eye 3D display screen, and immersive flight experience is provided for a user.
Optionally, the unmanned aerial vehicle and the server processing module perform high-speed communication through the 5G communication module, and send panoramic video frame data acquired by the binocular camera to the server processing module.
Optionally, the method further includes: and playing audio information captured in the flight process of the unmanned aerial vehicle by using the 3D sound box module.
Optionally, the method further includes: the control module controls the flying direction and speed of the unmanned aerial vehicle through the sight line direction and the sight line concentration ratio; the higher the concentration of the sight line is, the faster the advancing direction is; the flight direction is adjusted along with the sight line direction.
Optionally, the method further includes: the control console module is used for realizing remote control of the unmanned aerial vehicle, and the unmanned aerial vehicle is remotely controlled through each operating button; and/or the voice control module is used for realizing the remote control of the unmanned aerial vehicle, and the voice command is used for realizing the remote control of the unmanned aerial vehicle; and/or utilize gesture control module to realize the remote control to unmanned aerial vehicle, realize the remote control to unmanned aerial vehicle through the gesture of discerning.
Has the advantages that:
1) the virtual remote experience system and method for tourism provided by the invention are suitable for remote virtual tourism, the unmanned aerial vehicle is introduced to participate in the virtual tourism, the problem of single tourism scene of the traditional virtual tourism is avoided, such as Guangzhou tower, Baiyunshan, Zhujiang day/night sightseeing and the like can be flown, and the unmanned aerial vehicle can carry out close-range and/or remote observation at any time according to the viewpoint of a user due to flexible flight.
2) The invention adopts a deep learning method to realize sight tracking so as to control the flight state of the unmanned aerial vehicle and carry out speed control and direction control, and the deep learning method can solve the problem of realizing tracking of a user and further convert the tracking into a flight control instruction, so that participants obtain man-machine-in-one immersive virtual tourism experience.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and embodiments.
As shown in FIG. 1, the present invention provides a virtual travel remote experience system, comprising:
the unmanned aerial vehicle is provided with a binocular camera and is used for acquiring panoramic video frames in real time;
the sight tracking module is used for acquiring the sight direction of the user by utilizing the trained first deep neural network model;
the flight control module controls the flight direction of the unmanned aerial vehicle by utilizing the traced sight direction, so as to realize remote flight control of the unmanned aerial vehicle;
the server processing module is used for processing the panoramic video frames collected by the binocular camera and sending the panoramic video frames to a naked eye 3D display screen;
the naked eye 3D display screen is used for displaying information and providing immersive flight experience for users.
Optionally, the unmanned aerial vehicle and the server processing module perform high-speed communication through the 5G communication module, and send panoramic video frame data acquired by the binocular camera to the server processing module.
Optionally, the experience system further includes: and the 3D sound box module is used for playing audio information captured in the flight process of the unmanned aerial vehicle.
Optionally, the control module controls the flying direction and speed of the unmanned aerial vehicle through the sight line direction and the sight line concentration ratio; the higher the concentration of the sight line is, the faster the advancing direction is; the flight direction is adjusted along with the sight line direction.
Optionally, the system further comprises a console module, and the unmanned aerial vehicle is remotely controlled through each operating button; and/or the voice control module is used for realizing the remote control of the unmanned aerial vehicle through a voice command; and/or the gesture control module is used for realizing the remote control of the unmanned aerial vehicle through the recognized gesture.
Optionally, under the condition that the control distance of the unmanned aerial vehicle can be met, the user can configure the unmanned aerial vehicle by himself; when the condition does not allow, also can rent the unmanned aerial vehicle of sight spot, then can rent unmanned aerial vehicle after the user pays, be responsible for unmanned aerial vehicle's management by sight spot integrated control center, for example: when a plurality of unmanned aerial vehicles possibly have flight position conflicts in the air flight process, an alarm is given in advance, flight positions and speeds meeting requirements are given, and under the emergency situation, a renter directly obtains flight authorities, so that different unmanned aerial vehicles are prevented from colliding. And after the risk is relieved, the speaking right is returned to the user.
Optionally, the first deep neural network is used to track the line of sight, determine a change state of the line of sight, and further convert the change state into a flight control command, for example: left, right, up, down, etc.
Optionally, the first deep neural network is a DRCNN network, and the DRCNN includes: one or more convolutional layers, one or more pooling layers, fully-connected layers; the convolution kernel size adopted by the convolution layer is 3 x 3; the DRCNN adopts an excitation function which is a sigmod function;
optionally, the DRCNN utilizes a Sine-Index-Softmax (Sine-Index-Softmax) to improve the accuracy of the gaze tracking; the sine exponential loss function is:
wherein, theta
yiDenoted as sample i and its corresponding label y
iAngle of vector (b) in which
yiIndicating that sample i is at its label y
iDeviation of (a) from (b)
jRepresents the deviation at output node j; the N represents the number of training samples; said w
yiRepresenting a sample i on its label y
iThe weight of (c).
Optionally, the pooling method of the pooling layer is as follows:
S=f(elogw+LOSSSIS);
where s represents the output of the current layer, f () represents the activation function, and w represents the weight of the current layer.
Optionally, the gaze concentration is obtained by a second deep neural network, where the second deep neural network is implemented by using an attention mechanism, and the second deep neural network is specifically an attention neural network, and optionally, the gaze concentration may share a convolution feature with the first neural network; or training independently to obtain the convolution characteristics suitable for the model. The attention neural network divides the user's attention into a plurality of speed levels. Optionally, the speed levels are 1, 2, 3, 4, 5, 6, 7 … N. The serial numbers represent the magnitude levels of the speeds, and the smaller the number, the faster the flight speed, whereas the larger the number, the slower the flight speed.
The excitation function adopted by the attention neural network is a cosine exponential excitation function and is marked as g (), wherein
Wherein, thetayiDenoted as sample i and its corresponding label yiThe vector included angle of (A); the N represents the number of training samples; said wyiIndicating that sample i is at its label yiThe weight of (c).
Correspondingly, the invention also provides a virtual tourism remote experience method, which is characterized by comprising the following steps:
acquiring a panoramic video frame in real time by using an unmanned aerial vehicle, wherein the unmanned aerial vehicle is provided with a binocular camera;
obtaining a sight direction of a user by using a sight tracking module, wherein the sight direction is obtained by using a trained first deep neural network model;
the flight control module is used for realizing remote flight control of the unmanned aerial vehicle, and the flight control module controls the flight direction of the unmanned aerial vehicle by using the traced sight direction;
processing the panoramic video frames collected by the binocular camera by using a server processing module, and sending the panoramic video frames to a naked eye 3D display screen;
and information is displayed by utilizing a naked eye 3D display screen, and immersive flight experience is provided for a user.
Optionally, the unmanned aerial vehicle and the server processing module perform high-speed communication through the 5G communication module, and send panoramic video frame data acquired by the binocular camera to the server processing module.
Optionally, the method further includes: and playing audio information captured in the flight process of the unmanned aerial vehicle by using the 3D sound box module.
Optionally, the method further includes: the control module controls the flying direction and speed of the unmanned aerial vehicle through the sight line direction and the sight line concentration ratio; the higher the concentration of the sight line is, the faster the advancing direction is; the flight direction is adjusted along with the sight line direction.
Optionally, the method further includes: the control console module is used for realizing remote control of the unmanned aerial vehicle, and the unmanned aerial vehicle is remotely controlled through each operating button; and/or the voice control module is used for realizing the remote control of the unmanned aerial vehicle, and the voice command is used for realizing the remote control of the unmanned aerial vehicle; and/or utilize gesture control module to realize the remote control to unmanned aerial vehicle, realize the remote control to unmanned aerial vehicle through the gesture of discerning.
Optionally, under the condition that the control distance of the unmanned aerial vehicle can be met, the user can configure the unmanned aerial vehicle by himself; when the condition does not allow, also can rent the unmanned aerial vehicle of sight spot, then can rent unmanned aerial vehicle after the user pays, be responsible for unmanned aerial vehicle's management by sight spot integrated control center, for example: when a plurality of unmanned aerial vehicles possibly have flight position conflicts in the air flight process, an alarm is given in advance, flight positions and speeds meeting requirements are given, and under the emergency situation, a renter directly obtains flight authorities, so that different unmanned aerial vehicles are prevented from colliding. And after the risk is relieved, the authority is returned to the user.
Optionally, the first deep neural network is used to track the line of sight, determine a change state of the line of sight, and further convert the change state into a flight control command, for example: left, right, up, down, etc.
Optionally, the first deep neural network is a DRCNN network, and the DRCNN includes: one or more convolutional layers, one or more pooling layers, fully-connected layers; the convolution kernel size adopted by the convolution layer is 3 x 3; the excitation function adopted by the DRCNN is a sigmod excitation function.
Optionally, the DRCNN utilizes a Sine-Index-Softmax (Sine-Index-Softmax) to improve the accuracy of the gaze tracking; the sine exponential loss function is:
wherein, theta
yiDenoted as sample i and its corresponding label y
iAngle of vector (b) in which
yiIndicating that sample i is at its label y
iDeviation of (a) from (b)
jRepresents the deviation at output node j; the N represents the number of training samples; said w
yiRepresenting a sample i on its label y
iThe weight of (c).
Optionally, the pooling method of the pooling layer is as follows:
S=f(elogw+LOSSSIS);
where s represents the output of the current layer, f () represents the activation function, and w represents the weight of the current layer.
Optionally, the gaze concentration is obtained by a second deep neural network, where the second deep neural network is implemented by using an attention mechanism, and the second deep neural network is specifically an attention neural network, and optionally, the gaze concentration may share a convolution feature with the first neural network; or training independently to obtain the convolution characteristics suitable for the model. The attention neural network divides the user's attention into a plurality of speed levels. Optionally, the speed levels are 1, 2, 3, 4, 5, 6, 7 … N. The serial numbers represent the magnitude levels of the speeds, and the smaller the number, the faster the flight speed, whereas the larger the number, the slower the flight speed.
The excitation function adopted by the attention neural network is a cosine exponential excitation function and is marked as g (), wherein
Wherein, thetayiDenoted as sample i and its corresponding label yiThe vector included angle of (A); the N represents the number of training samples; said wyiIndicating that sample i is at its label yiThe weight of (c).
The present application also proposes a computer-readable medium storing computer program instructions capable of executing any of the methods proposed by the present invention.
In the description herein, references to the description of "one embodiment," "an example," "a specific example" or the like are intended to mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the preferred embodiment of the present invention and is not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, or direct or indirect applications in other related fields, which are made by the present specification and drawings, are included in the scope of the present invention. The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.