Official repository for the paper [Robust High-Resolution Video Matting with Temporal Guidance](https://peterl1n.github.io/RobustVideoMatting/). RVM is specifically designed for robust human video matting. Unlike existing neural models that process frames as independent images, RVM uses a recurrent neural network to process videos with temporal memory. RVM can perform matting in real-time on any videos without additional inputs. It achieves **4K 76FPS** and **HD 104FPS** on an Nvidia GTX 1080 Ti GPU. The project was developed at [ByteDance Inc.](https://www.bytedance.com/)
* [Aug 25 2021] Source code and pretrained models are published.
* [Jul 27 2021] Paper is accepted by WACV 2022.
<br>
## Showreel
Watch the showreel video ([YouTube](https://youtu.be/Jvzltozpbpk), [Bilibili](https://www.bilibili.com/video/BV1Z3411B7g7/)) to see the model's performance.
<palign="center">
<ahref="https://youtu.be/Jvzltozpbpk">
<imgsrc="documentation/image/showreel.gif">
</a>
</p>
All footage in the video are available in [Google Drive](https://drive.google.com/drive/folders/1VFnWwuu-YXDKG-N6vcjK_nL7YZMFapMU?usp=sharing) and [Baidu Pan](https://pan.baidu.com/s/1igMteDwN5rO1Sn7YIhBlvQ) (code: tb3w).
<br>
## Demo
* [Webcam Demo](https://peterl1n.github.io/RobustVideoMatting/#/demo): Run the model live in your browser. Visualize recurrent states.
* [Colab Demo](https://colab.research.google.com/drive/10z-pNKRnVNsp0Lq9tH1J_XPZ7CBC_uHm?usp=sharing): Test our model on your own videos with free GPU.
<br>
## Download
We recommend MobileNetv3 models for most use cases. ResNet50 models are the larger variant with small performance improvements. Our model is available on various inference frameworks. See [inference documentation](documentation/inference.md) for more instructions.
Tested on ONNX Runtime with CPU and CUDA backends. Provided models use opset 12. <ahref="documentation/inference.md#onnx">Doc</a>, <ahref="https://github.com/PeterL1n/RobustVideoMatting/tree/onnx">Exporter</a>.
Run the model on the web. <ahref="https://peterl1n.github.io/RobustVideoMatting/#/demo">Demo</a>, <ahref="https://github.com/PeterL1n/RobustVideoMatting/tree/tfjs">Starter Code</a>
CoreML does not support dynamic resolution. Other resolutions can be exported yourself. Models require iOS 13+. <code>s</code> denotes <code>downsample_ratio</code>. <ahref="documentation/inference.md#coreml">Doc</a>, <ahref="https://github.com/PeterL1n/RobustVideoMatting/tree/coreml">Exporter</a>
</td>
</tr>
</tbody>
</table>
All models are available in [Google Drive](https://drive.google.com/drive/folders/1pBsG-SCTatv-95SnEuxmnvvlRx208VKj?usp=sharing) and [Baidu Pan](https://pan.baidu.com/s/1puPSxQqgBFOVpW4W7AolkA) (code: gym7).
<br>
## PyTorch Example
1. Install dependencies:
```sh
pip install -r requirements_inference.txt
```
2. Load the model:
```python
import torch
from model import MattingNetwork
model = MattingNetwork('mobilenetv3').eval().cuda() # or "resnet50"
Please see [inference documentation](documentation/inference.md) for details on `downsample_ratio` hyperparameter, more converter arguments, and more advanced usage.
<br>
## Training and Evaluation
Please refer to the [training documentation](documentation/training.md) to train and evaluate your own model.
<br>
## Speed
Speed is measured with `inference_speed_test.py` for reference.
* Note 1: HD uses `downsample_ratio=0.25`, 4K uses `downsample_ratio=0.125`. All tests use batch size 1 and frame chunk 1.
* Note 2: GPUs before Turing architecture does not support FP16 inference, so GTX 1080 Ti uses FP32.
* Note 3: We only measure tensor throughput. The provided video conversion script in this repo is expected to be much slower, because it does not utilize hardware video encoding/decoding and does not have the tensor transfer done on parallel threads. If you are interested in implementing hardware video encoding/decoding in Python, please refer to [PyNvCodec](https://github.com/NVIDIA/VideoProcessingFramework).
* NCNN C++ and Android Demo: [ncnn_Android_RobustVideoMatting](https://github.com/FeiGeChuanShu/ncnn_Android_RobustVideoMatting) from [FeiGeChuanShu](https://github.com/FeiGeChuanShu)
* ONNXRuntime C++ Demo: [lite.ai.toolkit](https://github.com/DefTruth/lite.ai.toolkit/blob/main/ort/cv/rvm.cpp) and [demo](https://github.com/DefTruth/RobustVideoMatting.lite.ai.toolkit) from [DefTruth](https://github.com/DefTruth)