Official code for ROCA: Robust CAD Model Retrieval and Alignment from a Single Image (CVPR 2022)

Related tags

Computer VisionROCA
Overview

ROCA: Robust CAD Model Alignment and Retrieval from a Single Image (CVPR 2022)

Code release of our paper ROCA. Check out our video, paper, and website!

If you find our paper or this repository helpful, please cite:

@article{gumeli2022roca,
  title={ROCA: Robust CAD Model Retrieval and Alignment from a Single Image},
  author={G{\"u}meli, Can and Dai, Angela and Nie{\ss}ner, Matthias},
  booktitle={Proc. Computer Vision and Pattern Recognition (CVPR), IEEE},
  year={2022}
}

Development Environment

We use the following development environment for this project:

  • Nvidia RTX 3090 GPU
  • Intel Xeon W-1370
  • Ubuntu 20.04
  • CUDA Version 11.2
  • cudatoolkit 11.0
  • Pytorch 1.7
  • Pytorch3D 0.5 or 0.6
  • Detectron2 0.3

Installation

This code is developed using anaconda3 with Python 3.8 (download here), therefore we recommend a similar setup.

You can simply run the following code in the command line to create the development environment:

$ source setup.sh

For visualizing some demo results or using the data preprocessing code, you need our custom rasterizer. In case the provided x86-64 linux shared object does not work for you, you may install the rasterizer here.

Running the Demo

We provide four sample input images in network/assets folder. The images are captured with a smartphone and then preprocessed to be compatible with ROCA format. To run the demo, you first need to download data and config from this Google Drive folder. Models folder contains the pre-trained model and used config, while Data folder contains images and dataset.

Assuming contents of the Models directory are in $MODEL_DIR and contents of the Data directory are in $DATA_DIR, you can run:

$ cd network
$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml

You will see image overlay and CAD visualization are displayed one by one. Open3D mesh visualization is an interactive window where you can see geometries from different viewpoints. Close the Open3D window to continue to the next visualization. You will see similar results to the image above.

For headless visualization, you can specify an output directory where resulting images and meshes are placed:

$ python demo.py --model_path $MODEL_DIR/model_best.pth --data_dir $DATA_DIR/Dataset --config_path $MODEL_DIR/config.yaml --output_dir $OUTPUT_DIR

You may use the --wild option to visualize results with "wild retrieval". Note that we omit the table category in this case due to large size diversity.

Preparing Data

Downloading Processed Data (Recommended)

We provide preprocessed images and labels in this Google Drive folder. Download and extract all folders to a desired location before running the training and evaluation code.

Rendering Data

Alternatively, you can render data yourself. Our data preparation code lives in the renderer folder.

Our project depends on ShapeNet (Chang et al., '15), ScanNet (Dai et al. '16), and Scan2CAD (Avetisyan et al. '18) datasets. For ScanNet, we use ScanNet25k images which are provided as a zip file via the ScanNet download script.

Once you get the data, check renderer/env.sh file for the locations of different datasets. The meanings of environment variables are described as inline comments in env.sh.

After editing renderer/env.sh, run the data generation script:

$ cd renderer
$ sh run.sh

Please check run.sh to see how individual scripts are running for data preprocessing and feel free to customize the data pipeline!

Training and Evaluating Models

Our training code lives in the network directory. Navigate to the network/env.sh and edit the environment variables. Make sure data directories are consistent with the ones locations downloaded and extracted folders. If you manually prepared data, make sure locations in /network/env.sh are consistent with the variables set in renderer/env.sh.

After you are done with network/env.sh, run the run.sh script to train a new model or evaluate an existing model based on the environment variables you set in env.sh:

$ cd network
$ sh run.sh

Replicating Experiments from the Main Paper

Based on the configurations in network/env.sh, you can run different ablations from the paper. The default config will run the (final) experiment. You can do the following edits cumulatively for different experiments:

  1. For P+E+W+R, set RETRIEVAL_MODE=resnet_resnet+image
  2. For P+E+W, set RETRIEVAL_MODE=nearest
  3. For P+E, set NOC_WEIGHTS=0
  4. For P, set E2E=0

Resources

To get the datasets and gain further insight regarding our implementation, we refer to the following datasets and open-source codebases:

Datasets and Metadata

Libraries

Projects

Select range and every time the screen changes, OCR is activated.

ASOCR(Auto Screen OCR) Select range and every time you press Space key, OCR is activated. 範囲を選ぶと、あなたがスペースキーを押すたびに、画面が変わる度にOCRが起動します。 usage1: simple OC

1 Feb 13, 2022
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 07, 2023
Image processing is one of the most common term in computer vision

Image processing is one of the most common term in computer vision. Computer vision is the process by which computers can understand images and videos, and how they are stored, manipulated, and retri

Happy N. Monday 3 Feb 15, 2022
Deep learning based page layout analysis

Deep Learning Based Page Layout Analyze This is a Python implementaion of page layout analyze tool. The goal of page layout analyze is to segment page

186 Dec 29, 2022
An organized collection of tutorials and projects created for aspriring computer vision students.

A repository created with the purpose of teaching students in BME lab 308A- Hanoi University of Science and Technology

Givralnguyen 5 Nov 24, 2021
A tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background.

EasyLaMa (WIP) This is a tool combining EasyOCR and LaMa to automatically detect text and replace it with an inpainted background. Installation For GP

3 Sep 17, 2022
OpenCVを用いたカメラキャリブレーションのサンプルです。2021/06/21時点でPython実装のある3種類(通常カメラ向け、魚眼レンズ向け(fisheyeモジュール)、全方位カメラ向け(omnidirモジュール))について用意しています。

OpenCV-CameraCalibration-Example FishEyeCameraCalibration.mp4 OpenCVを用いたカメラキャリブレーションのサンプルです 2021/06/21時点でPython実装のある以下3種類について用意しています。 通常カメラ向け 魚眼レンズ向け(

KazuhitoTakahashi 34 Nov 17, 2022
Code for paper "Role-based network embedding via structural features reconstruction with degree-regularized constraint"

Role-based network embedding via structural features reconstruction with degree-regularized constraint Train python main.py --dataset brazil-flights

wang zhang 1 Jun 28, 2022
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 03, 2023
Text-to-Image generation

Generate vivid Images for Any (Chinese) text CogView is a pretrained (4B-param) transformer for text-to-image generation in general domain. Read our p

THUDM 1.3k Jan 05, 2023
Automatically remove the mosaics in images and videos, or add mosaics to them.

Automatically remove the mosaics in images and videos, or add mosaics to them.

Hypo 1.4k Dec 30, 2022
Automatically fishes for you while you are afk :)

Dank-memer-afk-script A simple and quick way to make easy money in Dank Memer! How to use Open a discord channel which has the Dank Memer bot enabled.

Pranav Doshi 9 Nov 11, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
A pkg stiching around view images(4-6cameras) to generate bird's eye view.

AVP-BEV-OPEN Please check our new work AVP_SLAM_SIM A pkg stiching around view images(4-6cameras) to generate bird's eye view! View Demo · Report Bug

Xinliang Zhong 37 Dec 01, 2022
Maze generator and solver with python

Procedural-Maze-Generator-Algorithms Check out my youtube channel : Auctux Ressources Thanks to Jamis Buck Book : Mazes for programmers Requirements P

Joseph 19 Dec 07, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022
STEFANN: Scene Text Editor using Font Adaptive Neural Network

STEFANN: Scene Text Editor using Font Adaptive Neural Network @ The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2020.

Prasun Roy 208 Dec 11, 2022
A synthetic data generator for text recognition

TextRecognitionDataGenerator A synthetic data generator for text recognition What is it for? Generating text image samples to train an OCR software. N

Edouard Belval 2.5k Jan 04, 2023
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023