- Notifications
You must be signed in to change notification settings - Fork0
"Gradio" Interface for SpatialLM Model | A 3D Large Language Model for Structured Scene Understanding, Processing Point Cloud Data from Monocular Videos, RGBD Images, and LiDAR.
License
miladfa7/SpatialLM-Gradio
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SpatialLM: A 3D Large Language Model for Structured Scene Understanding, Processing Point Cloud Data from Monocular Videos, RGBD Images, and LiDAR.
SpatialLM is a 3D large language model designed to process 3D point cloud data and generate structured 3D scene understanding outputs. These outputs include architectural elements like walls, doors, windows, and oriented object bounding boxes with their semantic categories. Unlike previous methods that require specialized equipment for data collection, SpatialLM can handle point clouds from diverse sources such as monocular video sequences, RGBD images, and LiDAR sensors. This multimodal architecture effectively bridges the gap between unstructured 3D geometric data and structured 3D representations, offering high-level semantic understanding. It enhances spatial reasoning capabilities for applications in embodied robotics, autonomous navigation, and other complex 3D scene analysis tasks.Project Page |Official Code
Model | Download |
---|---|
SpatialLM-Llama-1B | 🤗 HuggingFace |
SpatialLM-Qwen-0.5B | 🤗 HuggingFace |
Tested with the following environment:
- Python 3.11
- Pytorch 2.4.1
- CUDA Version 12.4
# clone the repositorygit clone https://github.com/manycore-research/SpatialLM-Gradio.gitcd SpatialLM-Gradio# create a conda environment with cuda 12.4conda create -n spatiallm-gradio python=3.11conda activate spatiallm-gradioconda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda-forge::sparsehash# Install dependencies with poetrypip install poetry&& poetry config virtualenvs.createfalse --localpoetry installpoe install-torchsparse# Building wheel for torchsparse will take a whilepip install gradio_rerun
In the current version of SpatialLM, input point clouds are considered axis-aligned where the z-axis is the up axis. This orientation is crucial for maintaining consistency in spatial understanding and scene interpretation across different datasets and applications.Example preprocessed point clouds, reconstructed from RGB videos usingMASt3R-SLAM, are available inSpatialLM-Testset.
Download an example point cloud:
huggingface-cli download manycore-research/SpatialLM-Testset pcd/scene0000_00.ply --repo-type dataset --local-dir.
SpatialLM-Llama-1B is derived from Llama3.2-1B-Instruct, which is licensed under the Llama3.2 license.SpatialLM-Qwen-0.5B is derived from the Qwen-2.5 series, originally licensed under the Apache 2.0 License.
All models are built upon the SceneScript point cloud encoder, licensed under the CC-BY-NC-4.0 License. TorchSparse, utilized in this project, is licensed under the MIT License.
I would like to thank the following projects that made this work possible: