CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

Overview

LED2-Net

This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering".

You can visit our project website and upload your own panorama to see the 3D results!

[Project Website] [Paper (arXiv)]

Prerequisite

This repo is primarily based on PyTorch. You can use the follwoing command to intall the dependencies.

pip install -r requirements.txt

Preparing Training Data

Under LED2Net/Dataset, we provide the dataloader of Matterport3D and Realtor360. The annotation formats of the two datasets follows PanoAnnotator. The detailed description of the format is explained in LayoutMP3D.

Under config/, config_mp3d.yaml and config_realtor360.yaml are the configuration file for Matterport3D and Realtor360.

Matterport3D

To train/val on Matterport3D, please modify the two items in config_mp3d.yaml.

dataset_image_path: &dataset_image_path '/path/to/image/location'
dataset_label_path: &dataset_label_path '/path/to/label/location'

The dataset_image_path and dataset_label_path follow the folder structure:

  dataset_image_path/
  |-------17DRP5sb8fy/
          |-------00ebbf3782c64d74aaf7dd39cd561175/
                  |-------color.jpg
          |-------352a92fb1f6d4b71b3aafcc74e196234/
                  |-------color.jpg
          .
          .
  |-------gTV8FGcVJC9/
          .
          .
  dataset_label_path/
  |-------mp3d_train.txt
  |-------mp3d_val.txt
  |-------mp3d_test.txt
  |-------label/
          |-------Z6MFQCViBuw_543e6efcc1e24215b18c4060255a9719_label.json
          |-------yqstnuAEVhm_f2eeae1a36f14f6cb7b934efd9becb4d_label.json
          .
          .
          .

Then run main.py and specify the config file path

python main.py --config config/config_mp3d.yaml --mode train # For training
python main.py --config config/config_mp3d.yaml --mode val # For testing

Realtor360

To train/val on Realtor360, please modify the item in config_realtor360.yaml.

dataset_path: &dataset_path '/path/to/dataset/location'

The dataset_path follows the folder structure:

  dataset_path/
  |-------train.txt
  |-------val.txt
  |-------sun360/
          |-------pano_ajxqvkaaokwnzs/
                  |-------color.png
                  |-------label.json
          .
          .
  |-------istg/
          |-------1/
                  |-------1/
                          |-------color.png
                          |-------label.json
                  |-------2/
                          |-------color.png
                          |-------label.json
                  .
                  .
          .
          .
          
  

Then run main.py and specify the config file path

python main.py --config config/config_realtor360.yaml --mode train # For training
python main.py --config config/config_realtor360.yaml --mode val # For testing

Run Inference

After finishing the training, you can use the following command to run inference on your own data (xxx.jpg or xxx.png).

python run_inference.py --config YOUR_CONFIG --src SRC_FOLDER/ --dst DST_FOLDER --ckpt XXXXX.pkl

This script will predict the layouts of all images (jpg or png) under SRC_FOLDER/ and store the results as json files under DST_FOLDER/.

Pretrained Weights

We provide the pretrained model of Realtor360 in this link.

Currently, we use DuLa-Net's post processing for inference. We will release the version using HorizonNet's post processing later.

Layout Visualization

To visualize the 3D layout, we provide the visualization tool in 360LayoutVisualizer. Please clone it and install the corresponding packages. Then, run the following command

cd 360LayoutVisualizer/
python visualizer.py --img xxxxxx.jpg --json xxxxxx.json

Citation

@misc{wang2021led2net,
      title={LED2-Net: Monocular 360 Layout Estimation via Differentiable Depth Rendering}, 
      author={Fu-En Wang and Yu-Hsuan Yeh and Min Sun and Wei-Chen Chiu and Yi-Hsuan Tsai},
      year={2021},
      eprint={2104.00568},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}
Owner
Fu-En Wang
Hi, I am a member of VSLAB in National Tsing Hua University. You can check my personal website for more research projects (https://fuenwang.ml/).
Fu-En Wang
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
TextBoxes: A Fast Text Detector with a Single Deep Neural Network https://github.com/MhLiao/TextBoxes 基于SSD改进的文本检测算法,textBoxes_note记录了之前整理的笔记。

TextBoxes: A Fast Text Detector with a Single Deep Neural Network Introduction This paper presents an end-to-end trainable fast scene text detector, n

zhangjing1 24 Apr 28, 2022
Python Computer Vision Aim Bot for Roblox's Phantom Forces

Python-Phantom-Forces-Aim-Bot Python Computer Vision Aim Bot for Roblox's Phanto

drag0ngam3s 2 Jul 11, 2022
DouZero is a reinforcement learning framework for DouDizhu - 斗地主AI

[ICML 2021] DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning | 斗地主AI

Kwai 3.1k Jan 05, 2023
graph learning code for ogb

The final code for OGB Installation Requirements: ogb=1.3.1 torch=1.7.0 torch-geometric=1.7.0 torch-scatter=2.0.6 torch-sparse=0.6.9 Baseline models T

PierreHao 20 Nov 10, 2022
Perspective recovery of text using transformed ellipses

unproject_text Perspective recovery of text using transformed ellipses. See full writeup at https://mzucker.github.io/2016/10/11/unprojecting-text-wit

Matt Zucker 111 Nov 13, 2022
Amazing 3D explosion animation using Pygame module.

3D Explosion Animation 💣 💥 🔥 Amazing explosion animation with Pygame. 💣 Explosion physics An Explosion instance is made of a set of Particle objec

Dylan Tintenfich 12 Mar 11, 2022
Morphological edge detection or object's boundary detection using erosion and dialation in OpenCV python

Morphologycal-edge-detection-using-erosion-and-dialation the task is to detect object boundary using erosion or dialation . Here, use the kernel or st

Tamzid hasan 3 Nov 25, 2022
Repository for Scene Text Detection with Supervised Pyramid Context Network with tensorflow.

Scene-Text-Detection-with-SPCNET Unofficial repository for [Scene Text Detection with Supervised Pyramid Context Network][https://arxiv.org/abs/1811.0

121 Oct 15, 2021
OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

OpenCV-CameraCalibration-Example FishEyeCameraCalibration.mp4 OpenCVを用いたカメラキャリブレーションのサンプルです 2021/06/21時点でPython実装のある以下3種類について用意しています。 通常カメラ向け 魚眼レンズ向け(

KazuhitoTakahashi 34 Nov 17, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
Here use convulation with sobel filter from scratch in opencv python .

Here use convulation with sobel filter from scratch in opencv python .

Tamzid hasan 2 Nov 11, 2021
virtual mouse which can copy files, close tabs and many other features !

AI Virtual Mouse Controller Developed an AI-based system to control the mouse cursor using Python and OpenCV with the real-time camera. Fingertip loca

Diwas Pandey 23 Oct 05, 2021
Official PyTorch implementation for "Mixed supervision for surface-defect detection: from weakly to fully supervised learning"

Mixed supervision for surface-defect detection: from weakly to fully supervised learning [Computers in Industry 2021] Official PyTorch implementation

ViCoS Lab 169 Dec 30, 2022
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 01, 2022
Total Text Dataset. It consists of 1555 images with more than 3 different text orientations: Horizontal, Multi-Oriented, and Curved, one of a kind.

Total-Text-Dataset (Official site) Updated on April 29, 2020 (Detection leaderboard is updated - highlighted E2E methods. Thank you shine-lcy.) Update

Chee Seng Chan 671 Dec 27, 2022
Create single line SVG illustrations from your pictures

Create single line SVG illustrations from your pictures

Javier Bórquez 686 Dec 26, 2022
基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化

SimpleRPA 基于图像识别的开源RPA工具,理论上可以支持所有windows软件和网页的自动化 简介 SimpleRPA是一款python语言编写的开源RPA工具(桌面自动控制工具),用户可以通过配置yaml格式的文件,来实现桌面软件的自动化控制,简化繁杂重复的工作,比如运营人员给用户发消息,

Song Hui 7 Jun 26, 2022