brjathu/PHALPPublic

NotificationsYou must be signed in to change notification settings
Fork63
Star320

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)

License

View license

320 stars 63 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 101 Commits
assets		assets
phalp		phalp
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Repository files navigation

Tracking People by Predicting 3D Appearance, Location & Pose

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose".
Jathushan Rajasegaran,Georgios Pavlakos,Angjoo Kanazawa,Jitendra Malik.

This code repository provides a code implementation for our paper PHALP, with installation, a demo code to run on any videos, preparing datasets, and evaluating on datasets.

This branch contains code supporting our latest work:4D-Humans.
For the original PHALP code, please see theinitial release branch.

Installation

After installing thePyTorch dependency, you may install ourphalp package directly as:

pip install phalp[all]@git+https://github.com/brjathu/PHALP.git

Step-by-step instructions

git clone https://github.com/brjathu/PHALP.gitcd PHALPconda create -n phalp python=3.10conda activate phalpconda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidiapip install -e .[all]

Demo

To run our code on a video, please specifiy the input videovideo.source and an output directoryvideo.output_dir:

python scripts/demo.py video.source=assets/videos/gymnasts.mp4 video.output_dir='outputs'

The output directory will contain a video rendering of the tracklets and a.pkl file containing the tracklets with 3D pose and shape (see structure below).

Command-line options

Input Sources

You can specify various kinds of input sources. For example, you can specify a video file, a youtube video, a directory of images:

# for a video filepython scripts/demo.py video.source=assets/videos/vid.mp4# for a youtube videopython scripts/demo.py video.source=\'"https://www.youtube.com/watch?v=xEH_5T9jMVU"\'# for a directory of imagespython scripts/demo.py video.source=<dirtory_path>

Custom bounding boxes

In addition to these options, you can also give images and bounding boxes as inputs, so the model will only do tracking using the given bounding boxes. To do this, you need to specify thevideo.source as a.pkl file, where each key is the frame name and the absolute path to the image is computed asos.path.join(video.base_path, frame_name). The value of each key is a dictionary with the following keys:gt_bbox,gt_class,gt_track_id. Please see the following example.gt_boxes is anp.ndarray of shape(N, 4) where each row is a bounding box in the format of[x1, y1, x2, y2]. You can also givegt_class andgt_track_id to store it in the final output.

gt_data[frame_id]= {"gt_bbox":gt_boxes,"extra_data": {"gt_class": [],"gt_track_id": [],                        }                    }

Here is an example, of how to give bounding boxes and track-ids to the model and get the renderings.

mkdir assets/videos/gymnastsffmpeg -i assets/videos/gymnasts.mp4 -q:v 2 assets/videos/gymnasts/%06d.jpgpython scripts/demo.py \render.enable=True \video.output_dir=test_gt_bbox \use_gt=True \video.base_path=assets/videos/gymnasts \video.source=assets/videos/gt_tracks.pkl

Running on a subset of frames

You can specify the start and end of the video to be tracked, e.g. track from frame 50 to 100:

python scripts/demo.py video.source=assets/videos/vid.mp4 video.start_frame=50 video.end_frame=100

Tracking without extracting frames

However, if the video is too long and extracting the frames is too time consuming, you can setvideo.extract_video=False. This will use the torchvision backend and it will only keep the timestamps of the video in memeory. If this is enabled, you can give start time and end time of the video in seconds.

python scripts/demo.py video.source=assets/videos/vid.mp4 video.extract_video=False video.start_time=1s video.end_time=2s

Visualization type

We support multiple types of visualization inrender.type:HUMAN_MESH (default) renders the full human mesh,HUMAN_MASK visualizes the segmentation masks,HUMAN_BBOX visualizes the bounding boxes with track-ids,TRACKID_<id>_MESH renders the full human mesh but for track<id> only:

# render full human meshpython scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_MESH# render segmentation maskpython scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_MASK# render bounding boxes with track-idspython scripts/demo.py video.source=assets/videos/vid.mp4 render.type=HUMAN_BBOX# render a single track id, say 0python scripts/demo.py video.source=assets/videos/vid.mp4 render.type=TRACKID_0_MESH

More rendering types

In addition to these setting, for rendering meshes, PHALP uses head-mask visiualiztion, which only renders the upper body on the person to allow users to see the actually person and the track in the same video. To enable this, please set `render.head_mask=True`.

# for rendering detected and occluded peoplepython scripts/demo.py video.source=assets/videos/vid.mp4 render.head_mask=True

You can also visualize the 2D projected keypoints by settingrender.show_keypoints=True [TODO].

Track through shot-boundaries

By default, PHALP does not track through shot boundaries. To enable this, please setdetect_shots=True.

# for tracking through shot boundariespython scripts/demo.py video.source=assets/videos/vid.mp4 detect_shots=True

Additional Notes

For debugging purposes, you can setdebug=True to disable rich progress bar.

Output`.pkl` structure

The.pkl file containing tracks, 3D poses, etc. is stored under<video.output_dir>/results, and is a 2-level dictionary:

Detailed structure

importjoblibresults=joblib.load(<video.output_dir>/results/<video_name>.pkl)results= {# A dictionary for each frame.'vid_frame0.jpg': {'2d_joints':List[np.array(90,)],# 45x 2D joints for each detection'3d_joints':List[np.array(45,3)],# 45x 3D joints for each detection'annotations':List[Any],# custom annotations for each detection'appe':List[np.array(4096,)],# appearance features for each detection'bbox':List[[x0y0wh]],# 2D bounding box (top-left corner and dimensions) for each track (detections + ghosts)'camera':List[[txtytz]],# camera translation (wrt image) for each detection'camera_bbox':List[[txtytz]],# camera translation (wrt bbox) for each detection'center':List[[cxcy]],# 2D center of bbox for each detection'class_name':List[int],# class ID for each detection (0 for humans)'conf':List[float],# confidence score for each detection'frame_path':'vid_frame0.jpg',# Frame identifier'loca':List[np.array(99,)],# location features for each detection'mask':List[mask],# RLE-compressed mask for each detection'pose':List[np.array(229,)],# pose feature (concatenated SMPL params) for each detection'scale':List[float],# max(width, height) for each detection'shot':int,# Shot number'size':List[[imgwimgh]],# Image dimensions for each detection'smpl':List[Dict_SMPL],# SMPL parameters for each detection: betas (10), body_pose (23x3x3), global_orient (3x3)'tid':List[int],# Track ID for each detection'time':int,# Frame number'tracked_bbox':List[[x0y0wh]],# 2D bounding box (top-left corner and dimensions) for each detection'tracked_ids':List[int],# Track ID for each detection'tracked_time':List[int],# for each detection, time since it was last seen  },'vid_frame1.jpg': {    ...  },  ...}

Postprocessing pipeline

Coming soon.

Training and Evaluation

Coming soon.

Acknowledgements

Parts of the code are taken or adapted from the following repos:

Citation

If you find this code useful for your research or the use data generated by our method, please consider citing the following paper:

@inproceedings{rajasegaran2022tracking,title={Tracking People by Predicting 3{D} Appearance, Location \& Pose},author={Rajasegaran, Jathushan and Pavlakos, Georgios and Kanazawa, Angjoo and Malik, Jitendra},booktitle={CVPR},year={2022}}

About

Code repository for the paper "Tracking People by Predicting 3D Appearance, Location & Pose". (CVPR 2022 Oral)

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Tracking People by Predicting 3D Appearance, Location & Pose

Installation

Demo

Command-line options

Input Sources

Running on a subset of frames

Visualization type

Track through shot-boundaries

Output`.pkl` structure

Postprocessing pipeline

Training and Evaluation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Contributors2

Languages

Movatterモバイル変換

License

brjathu/PHALP

Folders and files

Latest commit

History

Repository files navigation

Tracking People by Predicting 3D Appearance, Location & Pose

Installation

Demo

Command-line options

Input Sources

Running on a subset of frames

Visualization type

Track through shot-boundaries

Output.pkl structure

Postprocessing pipeline

Training and Evaluation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Contributors2

Languages

Output`.pkl` structure

Packages