Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
In the description of the present specification, the terms "comprising," "including," "having," "containing," and the like are used in an open-ended fashion, i.e., to mean including, but not limited to. Reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," etc., means that a particular feature, structure, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the embodiments is for illustrative purposes to illustrate the implementation of the present application, and the sequence of steps is not limited and can be adjusted as needed.
The inventor finds that although data storage is irrelevant to data content, data calculation is directly connected with data storage, so that the invention corresponds block data of the HDFS and a geographic grid of the point cloud data, and when the point cloud data is stored, the point cloud data is divided into a plurality of data blocks according to the geographic grid and stored on each storage node host of the HDFS. When point cloud data corresponding to each geographic grid are calculated in a distributed mode, corresponding data blocks can be located based on each geographic grid, a calculation program is scheduled to run on a storage node host which stores the data blocks, local calculation to the maximum extent can be achieved, namely accurate distributed calculation is achieved, cross-node data access is reduced, and distributed calculation of transverse linear expansion is achieved.
Based on the above inventive concept, an embodiment of the present invention provides a method for processing point cloud data based on an HDFS, and fig. 1 is a flowchart of the method for processing point cloud data based on an HDFS provided in the embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
and S101, acquiring collected point cloud data.
As an alternative embodiment, the point cloud data acquired in S101 may be ground point cloud data acquired by an airborne laser radar.
And S102, carrying out meshing processing on the point cloud data to obtain a plurality of geographic grids corresponding to the point cloud data.
It should be noted that, in order to extract useful information from the collected point cloud data, a geographic grid is usually constructed according to the point cloud data to classify the point cloud data, and further extract different surface features. In the prior art, there are many methods for constructing a geographic grid according to point cloud data, and details of the method are not described in the embodiment of the present invention.
Optionally, after S102, the method for processing point cloud data based on HDFS according to the embodiment of the present invention may further include: and encoding the point cloud data belonging to the same geographic grid into the same grid code. The point cloud data in the same geographic grid are coded into the same grid code, so that the aim of quickly inquiring the geographic grid to which the point cloud data belong according to the point cloud data can be fulfilled.
Further, after point cloud data belonging to the same geographic grid are encoded into the same grid code, as a preferred embodiment, a grid code library may be constructed for all collected point cloud data, and the grid code library includes grid codes of the point cloud data, so that when the point cloud data is subjected to calculation processing, a grid code corresponding to the point cloud data to be processed may be queried based on the grid code library.
As an optional implementation manner, the embodiment of the present invention implements meshing processing on point cloud data based on a GeoHash algorithm, and encodes point cloud data belonging to the same geographic grid into the same GeoHash character string. The efficiency of performing longitude and latitude retrieval on the spatial information point data can be improved by establishing the spatial index in a GeoHash mode.
And S103, dividing the point cloud data into a plurality of data blocks according to each geographic grid corresponding to the point cloud data, wherein each data block comprises the point cloud data of one geographic grid.
Alternatively, the size of each data block may be 64M or 256M.
And S104, storing each data block on each storage node host of the HDFS.
Optionally, before the step S104, the method for processing point cloud data based on HDFS according to the embodiment of the present invention may further include: and identifying the data blocks corresponding to each geographic grid by adopting the grid codes of each geographic grid. The embodiment of the invention identifies the data blocks corresponding to each geographic grid based on the grid codes of each geographic grid, and can search the storage node host for storing the corresponding data blocks based on the grid codes.
As can be seen from the above, in the method for processing point cloud data based on the HDFS provided in the embodiments of the present invention, after acquiring the acquired point cloud data, meshing the point cloud data to obtain a plurality of geographic grids corresponding to the point cloud data, then dividing the point cloud data into a plurality of data blocks according to each geographic grid corresponding to the point cloud data, so that each data block contains the point cloud data of one geographic grid, and finally storing each divided data block on each storage node host of the HDFS. Since the point cloud data stored on the HDFS are stored according to each geographic grid, when the point cloud data on the HDFS are calculated, the storage node hosts, which store the data blocks corresponding to the point cloud data to be processed on the HDFS, can be searched based on each geographic grid corresponding to the point cloud data to be processed, and only the calculation program for calculating the point cloud data needs to be scheduled and run to the storage node hosts for storing the point cloud data to be processed.
By the point cloud data processing method based on the HDFS, the gridding distributed storage of the point cloud data is realized based on the HDFS, the transverse expansion of a storage space can be supported, and the minimum cross-node data access can be realized during distributed calculation.
As a preferred embodiment, the files storing the individual data blocks may be named in the trellis code of the corresponding geographic trellis.
Further, the method for processing the point cloud data based on the HDFS provided by the embodiment of the present invention may further include: and recording the mapping relation between each geographic grid and the storage node host for storing the corresponding data block so as to search the corresponding data block according to the geographic grid corresponding to the point cloud data to be processed based on the mapping relation.
In an optional embodiment, the method for processing point cloud data based on an HDFS according to the embodiment of the present invention may further include the following steps: acquiring a geographic grid corresponding to point cloud data to be processed; positioning a storage node host which stores data blocks corresponding to point cloud data to be processed on the HDFS according to a geographical grid corresponding to the point cloud data to be processed, wherein a plurality of data blocks are stored on the storage node host of the HDFS, and each data block comprises point cloud data of one geographical grid; and scheduling and running a calculation program for executing calculation processing on the point cloud data to be processed to a storage node host for storing the data blocks corresponding to the point cloud data to be processed on the HDFS.
Based on the above embodiment, after a calculation program for performing calculation processing on point cloud data to be processed is scheduled and run to a storage node host on the HDFS, where the storage node host stores data blocks corresponding to the point cloud data to be processed, as an optional implementation manner, the method for processing point cloud data based on the HDFS provided in the embodiment of the present invention may further include: obtaining the calculation result of each storage node host; and combining the calculation results of the storage node hosts to obtain the calculation result of the point cloud data to be processed.
It should be noted that, because the point cloud data stored according to the geographical grid is stored in advance in the HDFS, the storage node host that stores the point cloud data corresponding to the corresponding geographical grid in the HDFS is located according to the geographical grid corresponding to the point cloud data to be processed, and then only the computing program that performs the computing process on the point cloud data to be processed needs to be executed on the storage node host that stores the point cloud data to be processed.
Preferably, the computing program scheduled to run on the storage node accesses the point cloud data to be processed stored on the HDFS through the POSIX file proxy gateway, so that POSIX file proxy access can be realized, that is, the computing program can transparently access the HDFS file system.
As a preferred implementation manner, the method for processing point cloud data based on HDFS provided by the embodiment of the present invention can be applied to, but is not limited to, the architecture of the point cloud data storage system shown in fig. 2, as shown in fig. 2, including: the system comprises a Hadoop Distributed File System (HDFS), a POSIX file proxy gateway, a node scheduling service system, a storage node host, a grid code library (GeoHash code index library) and a storage node host information library; the HDFS is used for storing gridded point cloud data in a distributed mode; the node scheduling service system is respectively connected with the grid code library and the storage node host information library, based on the mapping relation between the grid codes recorded in the grid code library and the storage node hosts, the storage node hosts storing the point cloud data to be processed on the HDFS are positioned according to the grid codes of the geographic grids corresponding to the point cloud data to be processed, and the calculation programs are scheduled and operated on the storage node hosts storing the point cloud data to be processed, so that the calculation programs on the storage node hosts transparently access the data stored on the HDFS through a POSIX file proxy gateway through a POSIX file interface.
As shown in fig. 2, the whole storage system structure includes the following three parts:
(1) data block generation and distributed storage based on a geographic grid.
(2) For the file uploaded to the HDFS, the file is first sliced into file blocks (the size of each file block may be 64M or 256M) based on a geographical grid, and the geographical grid coding is performed to generate a grid code library, and binary groups (geocodes, nodecodes) of corresponding storage nodes are recorded in a database, where the geocodes are grid codes of the respective geographical grids, and the nodecodes are storage node host identifiers of the file blocks stored on the HDFS.
(3) Through the POSIX file proxy gateway, the file proxy access of POSIX is realized, namely an algorithm program can transparently access the HDFS file system.
Fig. 3 is a flowchart of a distributed computing scheduling service for point cloud data based on HDFS provided in an embodiment of the present invention, and as shown in fig. 3, a program registration module is configured to generate a computing program for performing computing processing on the point cloud data; and the computing scheduling module is used for retrieving the storage node host corresponding to the grid code from the grid code library according to the grid code of the geographic grid corresponding to the point cloud data to be processed to form a storage node host chain of the point cloud data to be processed, and then scheduling and operating a computing program for computing the point cloud data to be processed on the storage node host for storing the point cloud data to be processed.
Therefore, the method for processing the point cloud data based on the HDFS provided by the embodiment of the invention divides the point cloud data into a plurality of data blocks according to the geographic grid and stores the data blocks on the HDFS, thereby realizing the data block distributed storage based on the geographic grid and the distributed cluster calculation based on the distributed data blocks.
Taking ground point cloud data acquired by an airborne laser radar as an example, in the embodiment of the invention, after the ground point cloud data is acquired by the airborne laser radar, a geographic grid is constructed according to the ground point cloud data, then a point cloud file is divided into a plurality of data blocks according to the geographic grid based on GeoHash (point cloud data of the same grid is subjected to coding pretreatment based on GeoHash), the divided data blocks are completely corresponding to storage data blocks of HDFS, and the file names of the data blocks are coded according to GeoHash. After point cloud data to be processed are obtained, searching corresponding GeoHash codes, quickly positioning to a storage node host machine stored in the point cloud data to be processed based on HDFS, and starting a calculation program for performing calculation processing on the point cloud data to be processed on the storage node host machine for storing the point cloud data to be processed. Therefore, the data block storage of the HDFS is consistent with the data block association of the geographic grid, and cross-node file access calculation can be reduced to the greatest extent. Optionally, the node computer accesses the HDFS file system through the POSIX file proxy gateway, so as to achieve the purpose of transparently accessing the HDFS file system in a file manner.
Based on the same inventive concept, the embodiment of the present invention further provides a point cloud data processing apparatus based on the HDFS, as described in the following embodiments. Because the principle of solving the problems of the embodiment of the device is similar to the point cloud data processing method based on the HDFS, the implementation of the embodiment of the device can refer to the implementation of the method, and repeated parts are not repeated.
Fig. 4 is a schematic diagram of a point cloud data processing apparatus based on an HDFS according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes: a point clouddata acquisition module 401, a point cloud data meshingprocessing module 402, a point clouddata segmentation module 403, and a point cloud data meshing distributedstorage module 404.
The point clouddata acquisition module 401 is configured to acquire acquired point cloud data; a point cloud data meshingprocessing module 402, configured to perform meshing processing on the point cloud data to obtain multiple geographic meshes corresponding to the point cloud data; a point clouddata segmentation module 403, configured to segment the point cloud data into a plurality of data blocks according to each geographic grid corresponding to the point cloud data, where each data block includes point cloud data of one geographic grid; and the point cloud data gridding distributedstorage module 404 is used for storing each data block on each storage node host of the HDFS.
As can be seen from the above, in the point cloud data processing apparatus based on the HDFS provided in the embodiment of the present invention, the point clouddata acquisition module 401 acquires the acquired point cloud data, the point cloud data is subjected to meshing processing by the point cloud data meshingprocessing module 402 to obtain a plurality of geographic grids corresponding to the point cloud data, the point cloud data is then divided into a plurality of data blocks by the point clouddata dividing module 403 according to the geographic grids corresponding to the point cloud data, so that each data block includes the point cloud data of one geographic grid, and finally, each divided data block is stored in each storage node host of the HDFS by the point cloud data meshing distributedstorage module 404. Since the point cloud data stored on the HDFS are stored according to each geographic grid, when the point cloud data on the HDFS are calculated, the storage node hosts, which store the data blocks corresponding to the point cloud data to be processed on the HDFS, can be searched based on each geographic grid corresponding to the point cloud data to be processed, and only the calculation program for calculating the point cloud data needs to be scheduled and run to the storage node hosts for storing the point cloud data to be processed.
By the point cloud data processing device based on the HDFS, the gridding distributed storage of the point cloud data is realized based on the HDFS, the transverse expansion of a storage space can be supported, and the minimum cross-node data access can be realized during distributed calculation.
In an optional embodiment, the HDFS-based point cloud data processing apparatus provided in an embodiment of the present invention further includes: a to-be-processed point clouddata acquisition module 405, configured to acquire a geographic grid corresponding to the to-be-processed point cloud data; a storage nodehost positioning module 406, configured to position a storage node host on the HDFS, where the storage node host stores a data block corresponding to point cloud data to be processed, according to a geographic grid corresponding to the point cloud data to be processed; the point cloud data distributedcomputing module 407 is configured to schedule and run a computing program for performing computing processing on the point cloud data to be processed to the storage node host that stores the data block corresponding to the point cloud data to be processed on the HDFS.
Optionally, the computing program may access the point cloud data to be processed stored on the HDFS through a POSIX file proxy gateway.
In an optional embodiment, the HDFS-based point cloud data processing apparatus provided in an embodiment of the present invention further includes: and a point clouddata encoding module 408, configured to encode point cloud data belonging to the same geographic grid into the same grid code.
In an optional embodiment, the HDFS-based point cloud data processing apparatus provided in an embodiment of the present invention further includes: and a datablock identification module 409, configured to identify a data block corresponding to each geographic grid by using the grid code of each geographic grid.
In an optional embodiment, the HDFS-based point cloud data processing apparatus provided in an embodiment of the present invention further includes: and a mappingrelation recording module 410, configured to record mapping relations between the respective geographic grids and the storage node hosts storing the corresponding data blocks.
The embodiment of the invention also provides computer equipment, which is used for solving the technical problem that the calculated amount synchronously increases along with the increase of the amount of the point cloud data stored in a centralized manner or in a distributed manner in the prior art.
The embodiment of the invention also provides a computer readable storage medium, which is used for solving the technical problem that the calculated amount synchronously increases along with the increase of the amount of the point cloud data stored in a centralized or distributed manner in the prior art.
In summary, embodiments of the present invention provide a method, an apparatus, a computer device, and a computer readable storage medium for processing point cloud data based on an HDFS, where the point cloud data is partitioned into a plurality of data blocks according to a geographic grid based on a distributed storage, and the data blocks are stored in each storage node host of the HDFS, and the data storage nodes can be expanded laterally based on the HDFS, and where the data is calculated based on where the data is, so that the lowest cross-node access data is realized, thereby realizing the distributed calculation of lateral linear expansion.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.