Movatterモバイル変換


[0]ホーム

URL:


CN120182509A - Method, device, storage medium and equipment for indoor scene reconstruction - Google Patents

Method, device, storage medium and equipment for indoor scene reconstruction
Download PDF

Info

Publication number
CN120182509A
CN120182509ACN202510653377.5ACN202510653377ACN120182509ACN 120182509 ACN120182509 ACN 120182509ACN 202510653377 ACN202510653377 ACN 202510653377ACN 120182509 ACN120182509 ACN 120182509A
Authority
CN
China
Prior art keywords
point cloud
data
color
point
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202510653377.5A
Other languages
Chinese (zh)
Other versions
CN120182509B (en
Inventor
赵吴凡
张帅
华彤延
洪忠铖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hong Kong University Of Science And Technology Guangzhou
Original Assignee
Hong Kong University Of Science And Technology Guangzhou
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hong Kong University Of Science And Technology GuangzhoufiledCriticalHong Kong University Of Science And Technology Guangzhou
Priority to CN202510653377.5ApriorityCriticalpatent/CN120182509B/en
Publication of CN120182509ApublicationCriticalpatent/CN120182509A/en
Application grantedgrantedCritical
Publication of CN120182509BpublicationCriticalpatent/CN120182509B/en
Activelegal-statusCriticalCurrent
Anticipated expirationlegal-statusCritical

Links

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提供室内场景重建的方法、装置、存储介质及设备,包括:获取激光雷达点云信息和全景RGB图像;对全景RGB图像预处理得到六面立方体图像;结合全景相机的内参和外参将激光雷达点云信息投影到六面立方体图像,提取点云信息中每个点的颜色信息,生成彩色点云数据;根据彩色点云数据中每个点与全景相机的距离,生成RGB‑D数据序列;将六面立方体图像输入至对象分割模型,将室内场景划分为若干个独立对象区域;将各个独立对象区域输入至视觉‑语言模型得到各个独立对象的语义标签;将包含语义标签的各个独立对象与RGB‑D序列进行投影对准得到对准后的点云数据;将对准后的点云输入据输入至神经核表面重建模型得到重建后的室内场景。

The present invention provides a method, an apparatus, a storage medium and a device for indoor scene reconstruction, comprising: obtaining laser radar point cloud information and a panoramic RGB image; preprocessing the panoramic RGB image to obtain a six-sided cube image; projecting the laser radar point cloud information onto the six-sided cube image in combination with the internal and external parameters of a panoramic camera, extracting the color information of each point in the point cloud information, and generating color point cloud data; generating an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera; inputting the six-sided cube image into an object segmentation model to divide the indoor scene into a plurality of independent object areas; inputting each independent object area into a visual-language model to obtain a semantic label of each independent object; projecting and aligning each independent object containing the semantic label with the RGB-D sequence to obtain aligned point cloud data; and inputting the aligned point cloud input data into a neural core surface reconstruction model to obtain a reconstructed indoor scene.

Description

Method, device, storage medium and equipment for reconstructing indoor scene
Technical Field
The present invention relates to the field of image processing, and in particular, to a method, an apparatus, a storage medium, and a device for indoor scene reconstruction.
Background
In recent years, under the cooperative driving of artificial intelligence, robotics and space intelligence, indoor perception and three-dimensional reconstruction technology gradually become hot fields of academic research and engineering practice. The rise of the multi-mode sensor data fusion technology provides a new solution for the perception and reconstruction of indoor scenes. Based on sensors such as lidar and cameras, conventional three-dimensional reconstruction techniques typically generate the geometry of a scene through the processing of point cloud data or images.
However, as the application scene of the three-dimensional reconstruction technology is continuously expanded, the existing three-dimensional reconstruction technology has the problems that firstly, the point cloud information based on the set information is difficult to meet the requirements of high-level application such as intelligent robot navigation and task planning, secondly, the processing and calculation complexity of large-scale point cloud data is high, the real-time performance is insufficient, and finally, the use of a single sensor limits the detail expression of the scene and the diversity of applicable scenes.
Disclosure of Invention
Based on the method, the device, the storage medium and the equipment for reconstructing the indoor scene, the point cloud data and the panoramic RGB image are fused, image information with depth information and color information is constructed, semantic annotation of objects in the scene is carried out by adopting an object segmentation model and a vision-language model based on the fused image information, so that the indoor scene containing semantic information is constructed, and accurate and rich data support can be provided under the three-dimensional modeling or path planning scene.
In a first aspect, the present invention provides a method for indoor scene reconstruction, including:
acquiring laser radar point cloud information and a panoramic RGB image;
preprocessing the panoramic RGB image to obtain a hexahedral cube image;
Projecting the laser radar point cloud information to a hexahedral cube image by combining internal parameters and external parameters of a panoramic camera, extracting color information of each point in the point cloud information, and generating color point cloud data;
Generating an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera;
inputting the hexahedral cube images into an object segmentation model, and dividing an indoor scene into a plurality of independent object areas;
inputting each independent object area into a vision-language model to obtain semantic tags of each independent object;
Performing projection alignment on each independent object containing the semantic tag and the RGB-D sequence to obtain aligned point cloud data;
And inputting the aligned point cloud input data to a nerve core surface reconstruction model to obtain a reconstructed indoor scene.
Further, the method for reconstructing the indoor scene further comprises the following steps:
searching corresponding independent object results in the reconstructed indoor scene according to the received user instruction;
The searched independent object result is sent to a user side;
If the independent object result fed back by the user side is inconsistent with the user instruction, adding a supplementary semantic tag to the opposite object area indicated by the user instruction.
Further, the object segmentation model is an instance segmentation model.
Further, the visual-language model is a contrast language-image pre-training model or Grounding DINO model.
Further, the preprocessing is performed on the panoramic RGB image to obtain a hexahedral cube image, which specifically includes:
the panoramic RGB image is converted into a hexahedral cube image by adopting an equidistant columnar projection mode.
Further, the step of projecting the laser radar point cloud information to a hexahedral cube image by combining the internal parameters and the external parameters of the panoramic camera, extracting color information of each point in the point cloud information, and generating color point cloud data includes the following steps:
the following steps are performed for any one target point in the laser point cloud information:
Step S201, correcting the coordinates of the target point according to the external parameters of the panoramic camera to obtain corrected target point coordinates;
step S202, rotating the corrected target point coordinates according to the internal parameters of the panoramic camera to obtain target point mapping coordinates;
Step S203, combining the width and the height of the panoramic RGB image to obtain the projection point coordinates of the target point on the hexahedral cube;
Step S204, the color data of the projection point is recorded as the color data of the target point;
Step S205, repeating steps S201-S204 until each target point in the laser point cloud information determines color data, and generating color point cloud data.
Further, the generating the RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera includes the following steps:
Generating a depth map according to the distance between each point in the color point cloud data and the panoramic camera;
And combining the depth map with the color point cloud data to obtain an RGB-D data sequence.
Further, the combining the depth map with color point cloud data to obtain an RGB-D data sequence further includes:
And for each target point of the color point cloud data, if the depth value of the target point is smaller than the historical depth value, updating the color characteristics of the target point.
The invention also provides a device for reconstructing an indoor scene, which comprises:
the image acquisition module is used for acquiring laser radar point cloud information and panoramic RGB images;
the panoramic hexahedral module is used for preprocessing the panoramic RGB image to obtain a hexahedral cube image;
the point cloud data color extraction module is used for projecting the laser radar point cloud information to a hexahedral cube image in combination with the internal parameters and the external parameters of the panoramic camera, extracting the color information of each point in the point cloud information and generating color point cloud data;
the depth parameter combination module is used for generating an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera;
The object segmentation module is used for inputting the hexahedral cube images into an object segmentation model and dividing an indoor scene into a plurality of independent object areas;
the semantic annotation module is used for inputting each independent object region into the vision-language model to obtain semantic tags of each independent object;
The data fusion module is used for carrying out projection alignment on each independent object containing the semantic tag and the RGB-D sequence to obtain aligned point cloud data;
And the scene reconstruction module is used for inputting the aligned point cloud input data to the nerve core surface reconstruction model to obtain a reconstructed indoor scene.
In a third aspect, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any one of the first aspects of indoor scene reconstruction.
In a fourth aspect, the present invention also provides a computer device comprising a memory storing a computer program and a processor, which when executing the computer program, performs the method of any one of the indoor scene reconstruction of the first aspect.
The technical scheme has the advantages that the laser radar point cloud information and the panoramic RGB image are fused, accurate alignment of the handheld laser radar and the panoramic camera data is achieved, the three-dimensional scene graph of the open vocabulary is built through the algorithm, the objects which are not seen can be identified, the labels with rich semantics can be generated, the limitation of the predefined category is broken through, and the technical foundation is laid for open semantic segmentation. In the aspect of scene reconstruction, a neural surface reconstruction model is adopted to reconstruct the point cloud of the object instance in high precision in geometry and texture, so that the complex structure and detail of the scene are effectively captured, and meanwhile, the color and texture information is truly restored. By means of the object-level instance segmentation and real-time updating scene graph construction technology, the method and the device can adapt to the change of a dynamic scene, such as the movement, the addition or the removal of objects, and the understanding and modeling capability of the dynamic environment are remarkably improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.
FIG. 1 is a schematic diagram of a method for indoor scene reconstruction according to an embodiment of the present application;
FIG. 2 is a schematic diagram of data acquisition in an indoor scene reconstruction method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of combining panoramic RGB image with lidar point cloud information in an embodiment of the present application;
FIG. 4 is a schematic flow chart of a portion of a scene reconstruction in a method for reconstructing an indoor scene according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a semantic tag labeling flow in a method for indoor scene reconstruction according to an embodiment of the present application;
FIG. 6 is a schematic diagram of object segmentation in a method for indoor scene reconstruction according to an embodiment of the present application;
fig. 7 is a schematic diagram of an apparatus for indoor scene reconstruction according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In order to more specifically describe the present invention, the method, apparatus, storage medium and device for indoor scene reconstruction provided by the present invention are specifically described below with reference to the accompanying drawings.
Unless defined otherwise, technical or scientific terms used in the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present application belongs. The terms "first," "second," and the like, as used herein, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. Likewise, the terms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The word "comprising" or "comprises", and the like, means that the element or article preceding the word is meant to encompass the element or article listed thereafter and equivalents thereof without excluding other elements or articles. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which can be changed accordingly when the absolute position of the object to be described is changed.
In recent years, the rise of the multi-mode sensor data fusion technology provides a new solution for the perception and reconstruction of indoor scenes. By combining high-precision point cloud data of the laser radar with rich visual information of the camera, researchers can introduce semantic-level understanding while retaining geometric information. However, how to efficiently fuse multi-modal data and achieve end-to-end high precision and efficiency in scene segmentation, semantic annotation, and three-dimensional reconstruction is still a challenging research topic.
Deep learning semantic segmentation technology based on laser radar (LiDAR) point cloud, such as RandLA-Net model (Random Sampled PointNet, point cloud semantic segmentation model), KPConv model (Kernel Point Convolution, nuclear point convolution model) and SparseConvNet model (Submanifold Sparse Convolutional Network, sparse convolution model), realizes object recognition and semantic segmentation in a scene by modeling geometrical characteristics of point cloud data. The method is excellent in processing on the sparse point cloud, and can capture local geometric characteristics of the point cloud and generate three-dimensional semantic information. However, the method has significant limitations, and is characterized in that the method is dependent on geometric characteristics, lacks understanding capability of semantic hierarchy, cannot segment or identify new objects in unknown types or complex scenes, and further is difficult to effectively process object change conditions in dynamic scenes, and cannot update semantic information in real time. In addition, because the laser radar point cloud data volume is huge, the consumption of computing resources and memory is high when the large-scale point cloud data is directly processed, and the requirement of real-time application is difficult to meet. The segmentation method based on the point cloud data is limited in the tasks of open scenes and multi-mode requirements.
The semantic segmentation technology (such as Mask R-CNN, deepLab, vision-transducer) based on RGB images relies on convolutional neural networks or visual transducers to obtain excellent results on image feature extraction and example segmentation, so that various object types can be identified, and a high-resolution segmentation result can be generated. However, the implementation of this method depends only on two-dimensional images, has limited modeling capability for three-dimensional geometric information, and has difficulty in processing spatial structure and depth information in point cloud data. In addition, the technology is severely dependent on annotation data, cannot identify new types except closed-set vocabulary, has weak generalization capability, lacks dynamic updating capability on semantic information in a dynamic scene, cannot adapt to scene change in real time, and is difficult to support complex scene planning tasks, such as robot navigation and grabbing tasks requiring space semantic relation.
Furthermore, the prior art also comprises a multi-mode semantic segmentation technology (such as BEVFusion, fusionMLP, maskFusion) which realizes semantic segmentation and understanding of the three-dimensional scene by fusing the advantages of the laser radar point cloud information and the RGB image data. By utilizing semantic information in the image data and geometric information of the laser radar point cloud data, scenes can be more comprehensively described, and the precision of semantic segmentation is remarkably improved. In addition, the multi-mode data fusion increases the computational complexity, has higher requirements on hardware resources, and limits the application of the multi-mode semantic segmentation technology in real-time tasks.
Based on the consideration of the prior art, the invention provides a method for reconstructing an indoor scene, which obtains an RGB-D image by fusing point cloud data and a panoramic RGB image, and performs object segmentation and semantic annotation by combining an instance segmentation model and a visual-language model to obtain the indoor scene with a semantic tag.
The embodiment of the application provides an application scene of a method for reconstructing an indoor scene, which comprises terminal equipment provided by the embodiment, wherein the terminal equipment comprises, but is not limited to, a smart phone and computer equipment, and the computer equipment can be at least one of a desktop computer, a portable computer, a laptop computer, a mainframe computer, a tablet computer and the like. The terminal device receives indoor point cloud data sent by the laser radar and panoramic RGB images sent by the panoramic camera, and constructs a three-dimensional scene containing semantic information, and referring to a schematic diagram of an indoor scene reconstruction method shown in fig. 1, for specific processes, please refer to an embodiment of the indoor scene reconstruction method.
Step S101, laser radar point cloud information and a panoramic RGB image are acquired.
The laser radar point cloud data refer to a vector set of an indoor scene in a three-dimensional coordinate system, which is acquired by using a laser radar, and is used for providing three-dimensional geometric structure information of the indoor scene, in combination with a data acquisition schematic diagram in the indoor scene reconstruction method shown in fig. 2. Panoramic RGB image refers to a 360-degree panoramic RGB image of an indoor scene acquired by a panoramic camera, the RGB image is an image which generates various colors based on different intensity combinations of three basic colors of Red (Red), green (Green) and Blue (Blue), each pixel point in the RGB image is usually composed of three components, the brightness values of the three colors of Red, green and Blue are respectively represented, and the value range of each component in the RGB image is 0-255.
In this embodiment, as shown in fig. 2, laser radar point cloud information is acquired by using a handheld laser radar to provide three-dimensional geometric structure information of an indoor scene, and 360-degree panoramic RGB images of the indoor scene are acquired by using a panoramic camera to supplement semantic information and texture details. In this embodiment, the handheld laser radar may be a smart L1 laser radar, and the panoramic camera may be an instra 360 panoramic camera. In the embodiment, the hand-held laser radar and the panoramic camera are designed as a whole,
Step S102, preprocessing the panoramic RGB image to obtain a hexahedral cube image.
Specifically, considering that the panoramic RGB image can be severely stretched and distorted in a polar region (such as the top and bottom of the panoramic RGB image) due to a projection mode, which causes blurring or deformation of image details and is unfavorable for the subsequent indoor scene reconstruction process, the panoramic RGB image needs to be converted into a hexahedral cube image which is more similar to a conventional plane image and has little distortion in the application. It should be noted that, in this embodiment, the panoramic RGB image may be converted into a hexahedral cube image by using an equidistant columnar projection method. Specifically, spherical coordinates may be used for each pixel in the panoramic RGB imagePixel point converted into hexahedral cube image according to the following expressionThe expression is as follows:
,,
Spherical coordinatesRepresenting the longitude of the panoramic RGB image pixel on the sphere,Spherical coordinatesRepresenting the latitude of the pixel points of the panoramic RGB image on the spherical surface,
The equidistant columnar projection refers to projecting each part of the stereoscopic image on a plane in an equidistant mode, so that the size of the image on the projection plane corresponds to the size of the stereoscopic image one by one. The purpose of equidistant columnar projection is to show a larger stereoscopic image on a smaller projection plane while keeping the shape and scale of the image unchanged. Each surface of the hexahedral cube image after equidistant columnar projection is closer to the perspective of a conventional plane image, and the middle area has almost no distortion, so that the method is more beneficial to the subsequent indoor scene object perception.
Step S103, combining the internal parameters and the external parameters of the panoramic camera, projecting the laser radar point cloud information to a hexahedral cube image, extracting the color information of each point in the point cloud information, and generating color point cloud data.
Specifically, in combination with the schematic diagram of combining the panoramic RGB image and the laser radar point cloud information shown in fig. 3, in this embodiment, the panoramic camera and the laser radar are integrally set, the internal parameters of the panoramic camera include the focal length and the principal point coordinates of the panoramic camera, and the external parameters of the panoramic camera are the relative pose between the laser radar and the panoramic camera. And accurately projecting the point cloud data to the hexahedral cube image, extracting the color information of each point in the point cloud information, and generating color point cloud data. The color point cloud data includes color information of the point cloud data in addition to three-dimensional structure information of the point cloud data.
The embodiment projects laser radar point cloud information to a hexahedral cube image, extracts color information of each point in the point cloud information, and generates color point cloud data by the following method:
The following operations are performed for any one target point of the laser point cloud information:
step S201, correcting the coordinates of the target point according to the external parameters of the panoramic camera to obtain corrected target point coordinates.
Specifically, the expression for correcting the coordinates of the target point according to the external parameters of the panoramic camera is:
,
In order to correct the coordinates of the target point,Is the coordinates of the target point,The camera is an external parameter of the panoramic camera, and represents the relative pose between the laser radar and the panoramic camera.
Step S202, rotating the corrected target point coordinates according to the internal parameters of the panoramic camera to obtain target point mapping coordinates.
Specifically, the corrected target point coordinates are rotated according to the internal parameters of the panoramic camera, and the expression for obtaining the target point mapping coordinates is as follows:
,
coordinates are mapped for the target point,Is an internal reference of a panoramic camera, whereinIs the focal length of the panoramic camera in the horizontal direction,For the focal length of the panoramic camera in the vertical direction,Is the abscissa of the principal point of the image,Is the ordinate of the principal point of the image.
Step S203, combining the width and the height of the panoramic RGB image to obtain the projection point coordinates of the target point on the hexahedral cube.
Specifically, the expression for obtaining the coordinates of the projection point of the target point on the hexahedral cube by combining the width and the height of the panoramic RGB image is as follows:
,
,
Is the coordinates of the projection points of the hexahedral cube,For the width of the panoramic RGB image,For the height of the panoramic RGB image,The spherical coordinates of the panoramic RGB image corresponding to the coordinates are mapped for the target,,
Step S204, the color data of the projection point is recorded as the color data of the target point.
Specifically, the expression for recording the color data of the projection point as the color data of the target point is:
,
Is the color data of the target point,Is the color data of the proxel.
Step S205, repeating steps S201-S204 until each target point in the laser point cloud information determines color data, and generating color point cloud data.
Step S104, according to the distance between each point in the color point cloud data and the panoramic camera, an RGB-D data sequence is generated.
Specifically, according to the distance between each point in the color point cloud data and the panoramic camera, the generation of the RGB-D data sequence comprises the following steps:
Step S301, generating a depth map according to the distance between each point in the color point cloud data and the panoramic camera;
step S302, combining the depth map with color point cloud data to obtain an RGB-D data sequence.
The step S302 of combining the depth map with the color point cloud data to obtain the RGB-D data sequence includes the following specific steps:
For any one depth image pixel coordinate in the depth mapBackprojecting panoramic camera parameters into spatial coordinatesAccording to space coordinatesFinding nearest neighbors in color point cloud dataWill nearest neighbor pointMatching RGB values of (2) to depth image pixel coordinatesThereby, four-tuple can be generated by combining the pixel coordinates of the depth image with RGB valuesWhereinThe value is the RGB value of the nearest neighbor,Is the distance value of the depth image pixel. When each depth image pixel coordinate in the depth map is endowed with RGB value of nearest neighbor point in color point cloud data, generating corresponding quadruple, and then generating quadruple of all depth image pixel points in the depth mapAnd forming a four-channel image or sequence array, and sorting according to frame numbers to obtain the RGB-D data sequence.
It should be noted that, in generating the depth map according to the distance between the color point cloud data and the panoramic camera, because the coefficient point cloud lacks sufficient density and resolution to cause the false projection, the problem of "missing projection" (Leak) caused by the sparsity of the point cloud map, the problem of false projection of the occlusion point, and the lack of point cloud support for completely occluding the object surface data, the occlusion relationship cannot be accurately determined. For this problem, a Depth Buffer (Depth Buffer) is used to compare the Depth value of the current point with the value of the Depth Buffer, and the color or feature of the pixel point is updated only if the Depth of the current point is smaller than the recorded Depth value. Aiming at the shielding problem, a 5*5 minimum filter is utilized for screening, and errors or noise which is introduced by projecting the sparse point cloud onto the image frame is processed, so that the effect of removing abnormal values or wrong projection points is achieved.
The data processing process not only realizes the accurate alignment of the point cloud information and the image data, but also lays a foundation for the subsequent semantic segmentation and three-dimensional reconstruction.
Step S105, inputting the hexahedral cube image into an object segmentation model, and dividing the indoor scene into a plurality of independent object regions.
Specifically, the object segmentation Model of the present embodiment may be an instance segmentation Model (SEGMENT ANYTHING Model, SAM Model for short), and the indoor scene is divided into a plurality of independent object regions by generating a binary semantic mask for the hexahedral cube image. It should be noted that the SAM model can segment objects in any image through various interactive cues (such as points, boxes, text, or masks), and does not need to be fine-tuned for a specific task, so as to realize object segmentation.
Further, the binarized semantic mask is projected to a three-dimensional point cloud space through internal parameters and external parameters of the camera, wherein the internal parameters are used for calculating projection coordinates, and the external parameters are used for describing the pose relation between the laser radar and the panoramic camera, so that the generation of the multi-mode data and object-level instance point cloud is realized.
And S106, inputting each independent object area into a vision-language model to obtain semantic tags of each independent object.
Specifically, in this embodiment, in combination with the schematic view of object segmentation in the indoor scene reconstruction method shown in fig. 6, the visual-language model is a contrast language-Image Pre-training model (Contrastive Language-Image Pre-training, abbreviated as CLIP model), which can perform feature extraction on the RGB Image area of each independent object area, so as to generate the semantic tag of the open vocabulary.
Further, before step S106, a nearest point priority policy may be further set to perform an operation of eliminating projection errors on each independent object area, so as to ensure geometric accuracy of the object point cloud.
Specifically, for each semantic instance, a corresponding three-dimensional point cloud subset is extracted based on a projection relation or a preliminary segmentation result and used as an input area for subsequent processing, various possible depth deviations of image back projection or multi-view synthesis driving are considered, error estimation between the observation distance and an ideal position of each point is established, a nearest point priority strategy is adopted for candidate points in the same pixel area or space adjacent area, only a zone with the minimum distance with a panoramic camera is reserved as an effective observation result, geometrical artifact and ghost interference caused by multi-view re-projection can be effectively reduced by excluding redundant projection points which are far or have shielding, the screened point cloud geometry is used as high-precision geometric representation, and semantic label prediction is carried out by inputting a vision-language model, so that the recognition precision and robustness are improved.
Furthermore, considering the situations that the semantic objects are complex, the object combination is strong, and the description of the independent semantic tags is not clear enough to have ambiguity, in the embodiment, the semantic tags of the open vocabulary can label the category of the object (such as a chair or a table), and can supplement attribute description (such as a red round object) to provide multi-tag candidates for the complex object. For the part of the supplemental attribute description, the specific procedure is as follows:
step S1061, searching for a corresponding independent object result in the reconstructed indoor scene according to the received user instruction.
Step S1062, the searched independent object result is sent to the user side.
Step S1063, if the result of the independent object searched by the user side is inconsistent with the user instruction, adding a supplementary semantic tag to the independent object area indicated by the user instruction. The user instruction is an object to be searched by a user adopting natural language description, a specific object description can be extracted from the user instruction through a natural language recognition model, and corresponding independent object results can be searched by traversing semantic tags of each independent object in the reconstructed indoor scene according to the object description.
Wherein, the supplementary semantic tags include at least one of color attribute tags, shape attribute tags and position attribute tags. The color attribute label indicates the color attribute of the independent object, such as red, green, yellow and the like, the shape attribute label indicates the shape attribute of the independent object, such as circle, rectangle, heart and the like, the position attribute label indicates the specific position of the independent object in the reconstructed scene, such as upper left, northeast corner, between an A object and a B object and the like, the condition that the original label is ambiguous or the designability is not strong in the reconstructed indoor scene can be adjusted through the supplementary semantic label formed by at least one of the color attribute label, the shape attribute label and the position attribute label, for example, the original label is a chair, but the chairs or the chairs with various colors and shapes exist in each position in the reconstructed indoor scene, at this time, the independent object results of all chairs in the reconstructed indoor scene are fed back to the user side, if the independent object results searched by the user side are inconsistent with the user instruction (the user instruction is a chair needing to find a red heart-shaped backrest), and the supplementary semantic label (such as red heart-shaped backrest) is added according to the independent object area indicated by the user instruction.
In addition, aiming at the semantic tags of the open vocabulary, a user can generate specific confidence level assessment by combining the semantic tags when searching specific objects in the subsequent stage, so that the accuracy of semantic labeling is improved. The confidence evaluation refers to the accuracy of the result when the user searches for a specific object in the subsequent stage.
And step S107, performing projection alignment on each independent object containing the semantic tag and the RGB-D sequence to obtain aligned point cloud data.
And S108, inputting the aligned point cloud input data to a nerve core surface reconstruction model to obtain a reconstructed indoor scene.
Specifically, as shown in a scene reconstruction flow chart in fig. 4, in a scene reconstruction stage, a neural surface reconstruction model (Navigation Knowledge Situation and Reasoning, abbreviated as NKSR) is adopted to perform high-precision three-dimensional geometric reconstruction on the segmented object point cloud, and a NKSR model learns local geometric features in point cloud data by using a neural network and predicts a three-dimensional surface of an object through the features. It should be noted that, NKSR models combine Convolutional Neural Networks (CNN) and Kernel Methods (Kernel Methods), and can reconstruct the object surface efficiently and accurately by learning the local geometric information of the point cloud information, so as to provide high-quality basic data for subsequent applications, such as virtual reality, indoor navigation, scene simulation, and the like.
Based on the scheme, the handheld laser radar is combined with the panoramic camera, and the three-dimensional geometric information of the laser radar point cloud information and the texture and color information of the panoramic camera are accurately aligned, so that the limitation of a single mode is broken through, and standardized RGB-D data are generated. The multi-mode fusion mode effectively improves the comprehensiveness and detail expression capability of scene perception, and provides rich data support for subsequent semantic segmentation, three-dimensional modeling and path planning. In addition, the embodiment combines the SAM model and the CLIP model, realizes semantic segmentation of open vocabulary, can automatically generate semantic tags and instance segmentation results without additional data labeling or model training, breaks through category limitation compared with the traditional pre-defined category segmentation technology, has strong universality and adaptability, can identify unknown objects in complex scenes and endow the semantic tags, and greatly reduces model development cost and labeling difficulty.
Further, in this embodiment, the SAM model and the CLIP model are combined to detect the object and generate a specific label, which is not limited by a predefined category, but may sometimes cause unstable detection results or inconsistent detection results due to ambiguity generated by the label. In this regard, referring to the semantic tag labeling flow schematic diagram shown in fig. 5, in this embodiment, a Grounding DINO (Grounding Distillation with NO labels, unlabeled object detection model) model may be further used to label semantic tags for each independent object region, and a class file is introduced to be used as a predefined class tag set of the object detection model, so as to ensure consistency and repeatability of the segmentation result, define the class file according to its own actual requirement, and improve stability of the segmentation result.
In order to better illustrate the advantages of the indoor scene reconstruction method in practical application, the indoor gas pipeline wiring and indoor robot navigation are described by using the method.
(1) Indoor gas pipeline wiring example:
The method comprises the steps of collecting point cloud data and a 360-degree panorama of a kitchen space through a handheld laser radar and a panoramic camera, converting the panorama into a six-sided perspective view, generating a depth map according to the distance between the point cloud and the panoramic camera, generating RGB-D data, combining SAM and CLIP to realize open vocabulary semantic segmentation, and automatically identifying and marking the positions and shapes of objects such as walls, ceilings, floors, furniture and equipment which possibly obstruct wiring. And generating a semantic tag through the CLIP, adding detailed attribute description for the segmented object, and performing high-precision three-dimensional reconstruction on the point cloud by utilizing NKSR algorithm to generate a kitchen three-dimensional scene model with rich geometric details and real textures.
On the basis, the optimal path of the gas pipeline is automatically planned by combining with the pipeline wiring rule, the bracket fixing points are designed, dangerous areas are marked, and the safety and the rationality of wiring are ensured. The indoor scene reconstruction method remarkably reduces the data acquisition and modeling cost while improving the design efficiency of the gas wiring, has the advantages of high convenience and intelligence, and provides powerful technical support for construction, maintenance and safety evaluation of the gas wiring.
(2) Indoor robot navigation scene:
The method comprises the steps of utilizing point cloud data of an indoor scene collected by a laser radar and a panoramic camera and a 360-degree panoramic image to convert the panoramic image into a six-sided stereoscopic image, generating a depth image according to the distance between the point cloud and the panoramic camera, generating RGB-D data, generating semantic tags by dividing a large model SAM and a CLIP, accurately sensing and analyzing the environment by the system, and constructing a three-dimensional map with semantics. Based on a three-dimensional map with semantics, the robot can avoid obstacles and adjust navigation paths, and an optimal path is planned by using algorithms such as A/Dijkstra, so that the method is effectively applied to the fields of home service, commercial logistics, intelligent inspection and the like, and the method has the technical advantages and social values of intelligence, flexibility and high efficiency.
It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrow, the steps are not necessarily performed in order as indicated by the arrow. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 1 may include a plurality of sub-steps or sub-stages, which are not necessarily performed at the same time, but may be performed at different times, and the order in which the sub-steps or stages are performed is not necessarily sequential, but may be performed in turn or alternately with at least some of the other steps or sub-steps of other steps.
The embodiment of the present invention describes the method for reconstructing an indoor scene in detail, and the method disclosed in the present invention can be implemented by using various types of devices, so that the present invention also discloses an apparatus for reconstructing an indoor scene, and a specific embodiment is given below with reference to fig. 7.
The image acquisition module 501 is used for acquiring laser radar point cloud information and panoramic RGB images;
A panorama-to-hexahedral module 502, configured to pre-process the panoramic RGB image to obtain a hexahedral cube image;
the point cloud data color extraction module 503 is configured to combine the internal parameters and the external parameters of the panoramic camera, project the laser radar point cloud information to a hexahedral cube image, extract color information of each point in the point cloud information, and generate color point cloud data;
a depth parameter combining module 504, configured to generate an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera;
The object segmentation module 505 is configured to input the hexahedral cube image to an object segmentation model, and divide an indoor scene into a plurality of independent object regions;
the semantic annotation module 506 is configured to input each independent object region into a vision-language model to obtain a semantic label of each independent object;
The data fusion module 507 is configured to perform projection alignment on each independent object containing the semantic tag and the RGB-D sequence, so as to obtain aligned point cloud data;
The scene reconstruction module 508 is configured to input the aligned point cloud input data to a neural surface reconstruction model to obtain a reconstructed indoor scene.
The device for reconstructing the indoor scene can be fully referred to the above limitation of the method, and will not be repeated here. Each of the modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of the processor of the terminal device, or may be stored in software in the memory of the terminal device, so that the processor invokes and executes the operations corresponding to the above modules.
In one embodiment, the present invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of indoor scene reconstruction described above.
The computer readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read-only memory), an EPROM (erasable programmable read-only memory), a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a non-transitory computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium has storage space for program code to perform any of the method steps described above. These program code can be read from or written to one or more computer program products, which can be compressed in a suitable form.
In one embodiment, the present invention provides a computer device comprising a memory storing a computer program and a processor executing the method of indoor scene reconstruction described above.
The computer device includes a memory, a processor, and one or more computer programs, wherein the one or more computer programs may be stored in the memory and configured to be executed by the one or more processors, and one or more application programs configured to perform the method of indoor scene reconstruction described above.
The processor may include one or more processing cores. The processor uses various interfaces and lines to connect various portions of the overall computer device, perform various functions of the computer device, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, and invoking data stored in memory. Alternatively, the processor may be implemented in hardware in at least one of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), a report validator (Graphics Processing Unit, GPU) of the embedded point data, and a modem. The CPU mainly processes an operating system, a user interface, an application program and the like, the GPU is used for rendering and drawing display contents, and the modem is used for processing wireless communication. It will be appreciated that the modem may not be integrated into the processor and may be implemented solely by a single communication chip.
The Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). The memory may be used to store instructions, programs, code sets, or instruction sets. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like. The storage data area may also store data created by the terminal device in use, etc.
The foregoing embodiments are merely for illustrating the technical solution of the present invention, but not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that modifications may be made to the technical solution described in the foregoing embodiments or equivalents may be substituted for parts of the technical features thereof, and that such modifications or substitutions do not depart from the spirit and scope of the technical solution of the embodiments of the present invention in essence.

Claims (11)

Translated fromChinese
1.一种室内场景重建的方法,其特征在于,包括:1. A method for indoor scene reconstruction, comprising:获取激光雷达点云信息和全景RGB图像;Obtain lidar point cloud information and panoramic RGB images;对所述全景RGB图像进行预处理,得到六面立方体图像;Preprocessing the panoramic RGB image to obtain a six-sided cube image;结合全景相机的内参和外参,将所述激光雷达点云信息投影到六面立方体图像,提取点云信息中每个点的颜色信息,生成彩色点云数据;Combining the internal and external parameters of the panoramic camera, projecting the laser radar point cloud information onto a six-sided cube image, extracting the color information of each point in the point cloud information, and generating color point cloud data;根据彩色点云数据中每个点与全景相机的距离,生成RGB-D数据序列;Generate an RGB-D data sequence based on the distance between each point in the color point cloud data and the panoramic camera;将所述六面立方体图像输入至对象分割模型,将室内场景划分为若干个独立对象区域;Inputting the six-sided cube image into an object segmentation model to divide the indoor scene into a number of independent object areas;将各个所述独立对象区域输入至视觉-语言模型得到各个独立对象的语义标签;Inputting each of the independent object regions into a visual-language model to obtain a semantic label for each independent object;将包含语义标签的各个独立对象与RGB-D序列进行投影对准,得到对准后的点云数据;Project and align each independent object containing semantic labels with the RGB-D sequence to obtain aligned point cloud data;将对准后的点云输入据输入至神经核表面重建模型得到重建后的室内场景。The aligned point cloud input data is input into the neural kernel surface reconstruction model to obtain the reconstructed indoor scene.2.如权利要求1所述的室内场景重建的方法,其特征在于,还包括:2. The method for indoor scene reconstruction according to claim 1, further comprising:根据接收到的用户指令在重建后的室内场景中搜索相应的独立对象结果;Searching for corresponding independent object results in the reconstructed indoor scene according to the received user instruction;将搜索到的独立对象结果发送至用户端;Send the searched independent object results to the user end;若所述用户端反馈搜索到的独立对象结果与用户指令不一致,将所述用户指令指示的对立对象区域增加补充语义标签。If the independent object result searched out by the user end is inconsistent with the user instruction, a supplementary semantic label is added to the opposite object area indicated by the user instruction.3.如权利要求1所述的室内场景重建的方法,其特征在于,所述对象分割模型为实例分割模型。3. The method for indoor scene reconstruction as described in claim 1 is characterized in that the object segmentation model is an instance segmentation model.4.如权利要求3所述的室内场景重建的方法,其特征在于,所述视觉-语言模型为对比语言-图像预训练模型或Grounding DINO模型。4. The method for indoor scene reconstruction according to claim 3, characterized in that the vision-language model is a contrastive language-image pre-training model or a Grounding DINO model.5.如权利要求4所述的室内场景重建的方法,其特征在于,所述对所述全景RGB图像进行预处理,得到六面立方体图像,具体为:5. The method for indoor scene reconstruction according to claim 4, characterized in that the panoramic RGB image is preprocessed to obtain a six-sided cube image, specifically:采用等距柱状投影的方式将全景RGB图像转换为六面立方体图像。The panoramic RGB image is converted into a six-sided cube image using equirectangular projection.6.如权利要求5所述的室内场景重建的方法,其特征在于,所述结合全景相机的内参和外参,将所述激光雷达点云信息投影到六面立方体图像,提取点云信息中每个点的颜色信息,生成彩色点云数据,包括以下步骤:6. The method for indoor scene reconstruction according to claim 5, characterized in that the step of combining the internal and external parameters of the panoramic camera, projecting the laser radar point cloud information onto a six-sided cube image, extracting the color information of each point in the point cloud information, and generating color point cloud data comprises the following steps:对于激光点云信息中任意一个目标点执行以下步骤:For any target point in the laser point cloud information, perform the following steps:步骤S201,根据全景相机的外参将所述目标点的坐标进行校正,得到校正目标点坐标;Step S201, correcting the coordinates of the target point according to the external parameters of the panoramic camera to obtain the corrected coordinates of the target point;步骤S202,根据全景相机的内参将所述校正目标点坐标进行旋转,得到目标点映射坐标;Step S202, rotating the calibration target point coordinates according to the internal parameters of the panoramic camera to obtain the target point mapping coordinates;步骤S203,结合全景RGB图像的宽度和高度得到所述目标点在六面立方体的投影点坐标;Step S203, obtaining the projection point coordinates of the target point on the six-sided cube by combining the width and height of the panoramic RGB image;步骤S204,将所述投影点的颜色数据记为目标点的颜色数据;Step S204, recording the color data of the projection point as the color data of the target point;步骤S205,重复执行步骤S201-S204,直至激光点云信息中每个目标点都确定颜色数据,生成彩色点云数据。Step S205, repeating steps S201-S204 until the color data of each target point in the laser point cloud information is determined to generate color point cloud data.7.如权利要求6所述的室内场景重建的方法,其特征在于,所述根据彩色点云数据中每个点与全景相机的距离,生成RGB-D数据序列包括以下步骤:7. The method for indoor scene reconstruction according to claim 6, wherein generating an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera comprises the following steps:根据彩色点云数据中每个点与全景相机的距离生成深度图;Generate a depth map based on the distance between each point in the color point cloud data and the panoramic camera;将所述深度图与彩色点云数据结合,得到RGB-D数据序列。The depth map is combined with the color point cloud data to obtain an RGB-D data sequence.8.如权利要求7所述的室内场景重建的方法,其特征在于,所述将所述深度图与彩色点云数据结合,得到RGB-D数据序列,还包括:8. The method for indoor scene reconstruction according to claim 7, characterized in that the step of combining the depth map with the color point cloud data to obtain an RGB-D data sequence further comprises:对于彩色点云数据的每一个目标点,若所述目标点的深度值小于历史深度值,更新所述目标点的颜色特征。For each target point of the color point cloud data, if the depth value of the target point is less than the historical depth value, the color feature of the target point is updated.9.一种室内场景重建的装置,其特征在于,包括:9. A device for indoor scene reconstruction, comprising:图像获取模块,用于获取激光雷达点云信息和全景RGB图像;Image acquisition module, used to obtain lidar point cloud information and panoramic RGB images;全景转六面体模块,用于对所述全景RGB图像进行预处理,得到六面立方体图像;A panorama-to-hexahedron module is used to pre-process the panorama RGB image to obtain a hexahedron image;点云数据颜色提取模块,用于结合全景相机的内参和外参,将所述激光雷达点云信息投影到六面立方体图像,提取点云信息中每个点的颜色信息,生成彩色点云数据;A point cloud data color extraction module is used to combine the internal and external parameters of the panoramic camera, project the laser radar point cloud information onto a six-sided cube image, extract the color information of each point in the point cloud information, and generate color point cloud data;深度参数结合模块,用于根据彩色点云数据中每个点与全景相机的距离,生成RGB-D数据序列;The depth parameter combination module is used to generate an RGB-D data sequence according to the distance between each point in the color point cloud data and the panoramic camera;对象分割模块,用于将所述六面立方体图像输入至对象分割模型,将室内场景划分为若干个独立对象区域;An object segmentation module, used for inputting the six-sided cube image into an object segmentation model to divide the indoor scene into a plurality of independent object areas;语义标注模块,用于将各个所述独立对象区域输入至视觉-语言模型得到各个独立对象的语义标签;A semantic labeling module, used for inputting each of the independent object regions into a visual-language model to obtain a semantic label for each independent object;数据融合模块,用于将包含语义标签的各个独立对象与RGB-D序列进行投影对准,得到对准后的点云数据;The data fusion module is used to project and align each independent object containing semantic labels with the RGB-D sequence to obtain the aligned point cloud data;场景重建模块,用于将对准后的点云输入据输入至神经核表面重建模型得到重建后的室内场景。The scene reconstruction module is used to input the aligned point cloud input data into the neural core surface reconstruction model to obtain the reconstructed indoor scene.10.一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-8中任一项所述室内场景重建的方法的步骤。10. A computer-readable storage medium having a computer program stored thereon, wherein when the computer program is executed by a processor, the computer program implements the steps of the method for indoor scene reconstruction according to any one of claims 1 to 8.11.一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其特征在于,所述处理器执行所述计算机程序时,执行权利要求1-8中任一项所述室内场景重建的方法。11. A computer device comprising a memory and a processor, wherein the memory stores a computer program, wherein when the processor executes the computer program, the method for indoor scene reconstruction according to any one of claims 1 to 8 is executed.
CN202510653377.5A2025-05-212025-05-21Method, device, storage medium and equipment for reconstructing indoor sceneActiveCN120182509B (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202510653377.5ACN120182509B (en)2025-05-212025-05-21Method, device, storage medium and equipment for reconstructing indoor scene

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202510653377.5ACN120182509B (en)2025-05-212025-05-21Method, device, storage medium and equipment for reconstructing indoor scene

Publications (2)

Publication NumberPublication Date
CN120182509Atrue CN120182509A (en)2025-06-20
CN120182509B CN120182509B (en)2025-08-12

Family

ID=96039800

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202510653377.5AActiveCN120182509B (en)2025-05-212025-05-21Method, device, storage medium and equipment for reconstructing indoor scene

Country Status (1)

CountryLink
CN (1)CN120182509B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120526066A (en)*2025-07-252025-08-22兰笺(苏州)科技有限公司Panoramic image three-dimensional reconstruction method based on 3DGS

Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115731240A (en)*2021-08-262023-03-03广州视源电子科技股份有限公司 A segmentation method, device, electronic equipment and storage medium
US20240119697A1 (en)*2021-11-162024-04-11Google LlcNeural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
CN118505911A (en)*2024-03-272024-08-16南京信息工程大学Indoor scene reconstruction method, system and storage medium
CN118840486A (en)*2024-07-052024-10-25三星电子(中国)研发中心Three-dimensional reconstruction method and device of scene, electronic equipment and storage medium
CN119992082A (en)*2024-12-302025-05-13东南大学 A semantic segmentation method and system for scene-level synthetic point cloud enhancement based on diffusion model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN115731240A (en)*2021-08-262023-03-03广州视源电子科技股份有限公司 A segmentation method, device, electronic equipment and storage medium
US20240119697A1 (en)*2021-11-162024-04-11Google LlcNeural Semantic Fields for Generalizable Semantic Segmentation of 3D Scenes
CN118505911A (en)*2024-03-272024-08-16南京信息工程大学Indoor scene reconstruction method, system and storage medium
CN118840486A (en)*2024-07-052024-10-25三星电子(中国)研发中心Three-dimensional reconstruction method and device of scene, electronic equipment and storage medium
CN119992082A (en)*2024-12-302025-05-13东南大学 A semantic segmentation method and system for scene-level synthetic point cloud enhancement based on diffusion model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WUFAN ZHAO等: "Rotation-Aware Building Instance Segmentation From High-Resolution Remote Sensing Images", IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 17 August 2022 (2022-08-17), pages 1 - 5*

Cited By (2)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN120526066A (en)*2025-07-252025-08-22兰笺(苏州)科技有限公司Panoramic image three-dimensional reconstruction method based on 3DGS
CN120526066B (en)*2025-07-252025-10-03兰笺(苏州)科技有限公司Panoramic image three-dimensional reconstruction method based on 3DGS

Also Published As

Publication numberPublication date
CN120182509B (en)2025-08-12

Similar Documents

PublicationPublication DateTitle
Zhang et al.A review of deep learning-based semantic segmentation for point cloud
TW202034215A (en)Mapping object instances using video data
CN110458939A (en) Indoor scene modeling method based on perspective generation
CN120182509B (en)Method, device, storage medium and equipment for reconstructing indoor scene
CN114638866B (en) A point cloud registration method and system based on local feature learning
CN114758337A (en) A semantic instance reconstruction method, apparatus, device and medium
CN112927353A (en)Three-dimensional scene reconstruction method based on two-dimensional target detection and model alignment, storage medium and terminal
CN115719436A (en)Model training method, target detection method, device, equipment and storage medium
US12347141B2 (en)Method and apparatus with object pose estimation
CN117576303A (en)Three-dimensional image generation method, device, equipment and storage medium
Li et al.Polarmesh: A star-convex 3d shape approximation for object pose estimation
Liang et al.Material augmented semantic segmentation of point clouds for building elements
Akagic et al.Computer vision with 3d point cloud data: Methods, datasets and challenges
Chang et al.EI-MVSNet: Epipolar-guided multi-view stereo network with interval-aware label
Lyu et al.3DOPFormer: 3D occupancy perception from multi-camera images with directional and distance enhancement
CN112001247A (en)Multi-target detection method, equipment and storage device
Qian et al.Context-aware transformer for 3d point cloud automatic annotation
Jang et al.Two-phase approach for monocular object detection and 6-dof pose estimation
CN113191462A (en)Information acquisition method, image processing method and device and electronic equipment
Wong et al.Factored neural representation for scene understanding
Lin et al.6D object pose estimation with pairwise compatible geometric features
Meng et al.3D indoor scene geometry estimation from a single omnidirectional image: A comprehensive survey
Wang et al.Aerial-terrestrial Image Feature Matching: An Evaluation of Recent Deep Learning Methods
Tang et al.Implicit guidance and explicit representation of semantic information in points cloud: a survey
Jiao et al.NEHand: Enhancing Hand Pose Estimation in the Wild through Synthetic and Motion Capture Datasets

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination
GR01Patent grant
GR01Patent grant

[8]ページ先頭

©2009-2025 Movatter.jp