Code for CVPR'2022 paper ✨ "Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model"

Related tags

Computer VisionPPE
Overview

PPE

Repository for our CVPR'2022 paper:

Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model. Zipeng Xu, Tianwei Lin, Hao Tang, Fu Li, Dongliang He, Nicu Sebe, Radu Timofte, Luc Van Gool, Errui Ding. To appear in CVPR 2022.

Pytorch implementation is at here: zipengxuc/PPE-Pytorch.

Updates

24 Mar 2022: We update our arxiv-version paper.

30 Mar 2022: We have had some changes in releasing the code. Pytorch implementation is now at here: zipengxuc/PPE-Pytorch.

14 Apr 2022: Update our PaddlePaddle inference code in this repository.

To reproduce our results:

Setup:

  • Install CLIP:

    conda install --yes -c pytorch pytorch=1.7.1 torchvision cudatoolkit=<CUDA_VERSION>
    pip install ftfy regex tqdm gdown
    pip install git+https://github.com/openai/CLIP.git
  • Download pre-trained models:

    The code relies on the PaddleGAN (PaddlePaddle implementation of StyleGAN2). Download the pre-trained StyleGAN2 generator from here.

    We provided several pretrained PPE models on here.

  • Invert real images:

    The mapper is trained on latent vectors, so it is necessary to invert images into latent space. To edit human face, StyleCLIP provides the CelebA-HQ that was inverted by e4e: test set.

Usage:

Please first put downloaded pretraiend models and data on ckpt folder.

Inference

In PaddlePaddle version, we only provide inference code to generate editing results:

python mapper/evaluate.py

Reference

@article{xu2022ppe,
author = {Zipeng Xu and Tianwei Lin and Hao Tang and Fu Li and Dongliang He and Nicu Sebe and Radu Timofte and Luc Van Gool and Errui Ding},
title = {Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model},
journal = {arXiv preprint arXiv:2111.13333},
year = {2021}
}

If you have any questions, please contact [email protected]. :)

Owner
Zipeng Xu
Ph.D. student at MHUG, University of Trento. Research Interests: Generative Models, Vision-Language.
Zipeng Xu
Hand Detection and Finger Detection on Live Feed

Hand-Detection-On-Live-Feed Hand Detection and Finger Detection on Live Feed Getting Started Install the dependencies $ git clone https://github.com/c

Chauhan Mahaveer 2 Jan 02, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Ackermann Line Follower Robot Simulation.

Ackermann Line Follower Robot This is a simulation of a line follower robot that works with steering control based on Stanley: The Robot That Won the

Lucas Mazzetto 2 Apr 16, 2022
Captcha Recognition

The objective of this project is to recognize the target numbers in the captcha images correctly which would tell us how good or bad a captcha system has been built.

Mohit Kaushik 5 Feb 20, 2022
Some codes from PyImageSearch course's and external projects.

👨‍💻 Some codes and projects 👨‍💻 💡 Technologies 📜 Projects 📍 Chrome Dinosaur Controller 📦 Script 📍 Coins Counter 📦 Script 🤓 Author Lucas Biv

Lucas Bivar 25 Oct 24, 2021
轻量级公式 OCR 小工具:一键识别各类公式图片,并转换为 LaTeX 格式

QC-Formula | 青尘公式 OCR 介绍 轻量级开源公式 OCR 小工具:一键识别公式图片,并转换为 LaTeX 格式。 支持从 电脑本地 导入公式图片;(后续版本将支持直接从网页导入图片) 公式图片支持 .png / .jpg / .bmp,大小为 4M 以内均可; 支持印刷体及手写体,前

青尘工作室 26 Jan 07, 2023
Computer vision applications project (Flask and OpenCV)

Computer Vision Applications Project This project is at it's initial phase. This is all about the implementation of different computer vision techniqu

Suryam Thapa 1 Jan 26, 2022
Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Regions sanitàries (RS), Sectors Sanitaris (SS), Àrees de Gestió Assistencial (AGA) i Àrees Bàsiques de Salut (ABS) de Catalunya Fitxers GeoJSON de le

Glòria Macià Muñoz 2 Jan 23, 2022
Solution for Problem 1 by team codesquad for AIDL 2020. Uses ML Kit for OCR and OpenCV for image processing

CodeSquad PS1 Solution for Problem Statement 1 for AIDL 2020 conducted by @unifynd technologies. Problem Given images of bills/invoices, the task was

Burhanuddin Udaipurwala 111 Nov 27, 2022
A Tensorflow model for text recognition (CNN + seq2seq with visual attention) available as a Python package and compatible with Google Cloud ML Engine.

Attention-based OCR Visual attention-based OCR model for image recognition with additional tools for creating TFRecords datasets and exporting the tra

Ed Medvedev 933 Dec 29, 2022
Crop regions in napari manually

napari-crop Crop regions in napari manually Usage Create a new shapes layer to annotate the region you would like to crop: Use the rectangle tool to a

Robert Haase 4 Sep 29, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Code for CVPR 2022 paper "SoftGroup for Instance Segmentation on 3D Point Clouds"

SoftGroup We provide code for reproducing results of the paper SoftGroup for 3D Instance Segmentation on Point Clouds (CVPR 2022) Author: Thang Vu, Ko

Thang Vu 231 Dec 27, 2022
1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge

SIIM-COVID19-Detection Source code of the 1st place solution for SIIM-FISABIO-RSNA COVID-19 Detection Challenge. 1.INSTALLATION Ubuntu 18.04.5 LTS CUD

Nguyen Ba Dung 170 Dec 21, 2022
This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
A version of nrsc5-gui that merges the interface developed by cmnybo with the architecture developed by zefie in order to start a new baseline that is not heavily dependent upon Python processing.

NRSC5-DUI is a graphical interface for nrsc5. It makes it easy to play your favorite FM HD radio stations using an RTL-SDR dongle. It will also displa

61 Dec 22, 2022
Basic functions manipulating images using the OpenCV library

OpenCV Basic functions manipulating images using the OpenCV library. Reading Ima

Shatha Siala 3 Feb 17, 2022
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
Python-based tools for document analysis and OCR

ocropy OCRopus is a collection of document analysis programs, not a turn-key OCR system. In order to apply it to your documents, you may need to do so

OCRopus 3.2k Dec 31, 2022
An Implementation of the FOTS: Fast Oriented Text Spotting with a Unified Network

FOTS: Fast Oriented Text Spotting with a Unified Network Introduction This is a pytorch re-implementation of FOTS: Fast Oriented Text Spotting with a

GeorgeJoe 171 Aug 04, 2022