U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Related tags

Deep LearningU-2-Net
Overview

U2-Net: U Square Net

The official repo for our paper U2-Net(U square net) published in Pattern Recognition 2020:

U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection

Xuebin Qin, Zichen Zhang, Chenyang Huang, Masood Dehghan, Osmar R. Zaiane and Martin Jagersand

Contact: xuebin[at]ualberta[dot]ca

Updates !!!

(2021-May-5) Thank AK391 for sharing his Gradio Web Demo of U2-Net.

gradio_web_demo

(2021-Apr-29) Thanks Jonathan Benavides Vallejo for releasing his App LensOCR: Extract Text & Image, which uses U2-Net for extracting the image foreground.

LensOCR APP

(2021-Apr-18) Thanks Andrea Scuderi for releasing his App Clipping Camera, which is an U2-Net driven realtime camera app and "is able to detect relevant object from the scene and clip them to apply fancy filters".

Clipping Camera APP

(2021-Mar-17) Dennis Bappert re-trained the U2-Net model for human portrait matting. The results look very promising and he also provided the details of the training process and data generation(and augmentation) strategy, which are inspiring.

(2021-Mar-11) Dr. Tim developed a video version rembg for removing video backgrounds using U2-Net. The awesome demo results can be found on YouTube.

(2021-Mar-02) We found some other interesting applications of our U2-Net including MOJO CUT, Real-Time Background Removal on Iphone, Video Background Removal, Another Online Portrait Generation Demo on AWS, AI Scissor.

(2021-Feb-15) We just released an online demo http://profu.ai for the portrait generation. Please feel free to give it a try and provide any suggestions or comments.
Profuai

(2021-Feb-06) Recently, some people asked the problem of using U2-Net for human segmentation, so we trained another example model for human segemntation based on Supervisely Person Dataset.

(1) To run the human segmentation model, please first downlowd the u2net_human_seg.pth model weights into ./saved_models/u2net_human_seg/.
(2) Prepare the to-be-segmented images into the corresponding directory, e.g. ./test_data/test_human_images/.
(3) Run the inference by command: python u2net_human_seg_test.py and the results will be output into the corresponding dirctory, e.g. ./test_data/u2net_test_human_images_results/
Notes: Due to the labeling accuracy of the Supervisely Person Dataset, the human segmentation model (u2net_human_seg.pth) here won't give you hair-level accuracy. But it should be more robust than u2net trained with DUTS-TR dataset on general human segmentation task. It can be used for human portrait segmentation, human body segmentation, etc.

Human Image Segmentation
Human Video Human Video Results

(2020-Dec-28) Some interesting applications and useful tools based on U2-Net:
(1) Xiaolong Liu developed several very interesting applications based on U2-Net including Human Portrait Drawing(As far as I know, Xiaolong is the first one who uses U2-Net for portrait generation), image matting and so on.
(2) Vladimir Seregin developed an interesting tool, NN based lineart, for comparing the portrait results of U2-Net and that of another popular model, ArtLine, developed by Vijish Madhavan.
(3) Daniel Gatis built a python tool, Rembg, for image backgrounds removal based on U2-Net. I think this tool will greatly facilitate the application of U2-Net in different fields.

(2020-Nov-21) Recently, we found an interesting application of U2-Net for human portrait drawing. Therefore, we trained another model for this task based on the APDrawingGAN dataset.

Sample Results: Kids

Sample Results: Ladies

Sample Results: Men

Usage for portrait generation

  1. Clone this repo to local
git clone https://github.com/NathanUA/U-2-Net.git
  1. Download the u2net_portrait.pth from GoogleDrive or Baidu Pan(提取码:chgd)model and put it into the directory: ./saved_models/u2net_portrait/.

  2. Run on the testing set.
    (1) Download the train and test set from APDrawingGAN. These images and their ground truth are stitched side-by-side (512x1024). You need to split each of these images into two 512x512 images and put them into ./test_data/test_portrait_images/portrait_im/. You can also download the split testing set on GoogleDrive.
    (2) Running the inference with command python u2net_portrait_test.py will ouptut the results into ./test_data/test_portrait_images/portrait_results.

  3. Run on your own dataset.
    (1) Prepare your images and put them into ./test_data/test_portrait_images/your_portrait_im/. To obtain enough details of the protrait, human head region in the input image should be close to or larger than 512x512. The head background should be relatively clear.
    (2) Run the prediction by command python u2net_portrait_demo.py will outputs the results to ./test_data/test_portrait_images/your_portrait_results/.
    (3) The difference between python u2net_portrait_demo.py and python u2net_portrait_test.py is that we added a simple face detection step before the portrait generation in u2net_portrait_demo.py. Because the testing set of APDrawingGAN are normalized and cropped to 512x512 for including only heads of humans, while our own dataset may varies with different resolutions and contents. Therefore, the code python u2net_portrait_demo.py will detect the biggest face from the given image and then crop, pad and resize the ROI to 512x512 for feeding to the network. The following figure shows how to take your own photos for generating high quality portraits.

(2020-Sep-13) Our U2-Net based model is the 6th in MICCAI 2020 Thyroid Nodule Segmentation Challenge.

(2020-May-18) The official paper of our U2-Net (U square net) (PDF in elsevier(free until July 5 2020), PDF in arxiv) is now available. If you are not able to access that, please feel free to drop me an email.

(2020-May-16) We fixed the upsampling issue of the network. Now, the model should be able to handle arbitrary input size. (Tips: This modification is to facilitate the retraining of U2-Net on your own datasets. When using our pre-trained model on SOD datasets, please keep the input size as 320x320 to guarantee the performance.)

(2020-May-16) We highly appreciate Cyril Diagne for building this fantastic AR project: AR Copy and Paste using our U2-Net (Qin et al, PR 2020) and BASNet(Qin et al, CVPR 2019). The demo video in twitter has achieved over 5M views, which is phenomenal and shows us more application possibilities of SOD.

U2-Net Results (176.3 MB)

U<sup>2</sup>-Net Results

Our previous work: BASNet (CVPR 2019)

Required libraries

Python 3.6
numpy 1.15.2
scikit-image 0.14.0
python-opencv PIL 5.2.0
PyTorch 0.4.0
torchvision 0.2.1
glob

Usage for salient object detection

  1. Clone this repo
git clone https://github.com/NathanUA/U-2-Net.git
  1. Download the pre-trained model u2net.pth (176.3 MB) from GoogleDrive or Baidu Pan 提取码: pf9k or u2netp.pth (4.7 MB) from GoogleDrive or Baidu Pan 提取码: 8xsi and put it into the dirctory './saved_models/u2net/' and './saved_models/u2netp/'

  2. Cd to the directory 'U-2-Net', run the train or inference process by command: python u2net_train.py or python u2net_test.py respectively. The 'model_name' in both files can be changed to 'u2net' or 'u2netp' for using different models.

We also provide the predicted saliency maps (u2net results,u2netp results) for datasets SOD, ECSSD, DUT-OMRON, PASCAL-S, HKU-IS and DUTS-TE.

U2-Net Architecture

U<sup>2</sup>-Net architecture

Quantitative Comparison

Quantitative Comparison

Quantitative Comparison

Qualitative Comparison

Qualitative Comparison

Citation

@InProceedings{Qin_2020_PR,
title = {U2-Net: Going Deeper with Nested U-Structure for Salient Object Detection},
author = {Qin, Xuebin and Zhang, Zichen and Huang, Chenyang and Dehghan, Masood and Zaiane, Osmar and Jagersand, Martin},
journal = {Pattern Recognition},
volume = {106},
pages = {107404},
year = {2020}
}
Owner
Xuebin Qin
Postdoctoral Fellow at University of Alberta Canada, Studying on object detection, segmentation, visual tracking, etc.
Xuebin Qin
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
MicRank is a Learning to Rank neural channel selection framework where a DNN is trained to rank microphone channels.

MicRank: Learning to Rank Microphones for Distant Speech Recognition Application Scenario Many applications nowadays envision the presence of multiple

Samuele Cornell 20 Nov 10, 2022
WarpDrive: Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning on a GPU

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single GPU (Graphics Processing Unit).

Salesforce 334 Jan 06, 2023
VarCLR: Variable Semantic Representation Pre-training via Contrastive Learning

    VarCLR: Variable Representation Pre-training via Contrastive Learning New: Paper accepted by ICSE 2022. Preprint at arXiv! This repository contain

squaresLab 32 Oct 24, 2022
Learn the Deep Learning for Computer Vision in three steps: theory from base to SotA, code in PyTorch, and space-repetition with Anki

DeepCourse: Deep Learning for Computer Vision arthurdouillard.com/deepcourse/ This is a course I'm giving to the French engineering school EPITA each

Arthur Douillard 113 Nov 29, 2022
This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

Learning to Learn Graph Topologies This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies. Requirem

Stacy X PU 16 Dec 09, 2022
Proto-RL: Reinforcement Learning with Prototypical Representations

Proto-RL: Reinforcement Learning with Prototypical Representations This is a PyTorch implementation of Proto-RL from Reinforcement Learning with Proto

Denis Yarats 74 Dec 06, 2022
A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

Adrian Rosebrock 4.3k Jan 08, 2023
Contains a bunch of different python programm tasks

py_tasks Contains a bunch of different python programm tasks Armstrong.py - calculate Armsrong numbers in range from 0 to n with / without cache and c

Dmitry Chmerenko 1 Dec 17, 2021
Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt

Feed forward VQGAN-CLIP model, where the goal is to eliminate the need for optimizing the latent space of VQGAN for each input prompt. This is done by

Mehdi Cherti 135 Dec 30, 2022
Reproduced Code for Image Forgery Detection papers.

Image Forgery Detection With over 4.5 billion active internet users, the amount of multimedia content being shared every day has surpassed everyone’s

Umar Masud 15 Dec 06, 2022
一套完整的微博舆情分析流程代码,包括微博爬虫、LDA主题分析和情感分析。

已经将项目的关键文件上传,包含微博爬虫、LDA主题分析和情感分析三个部分。 1.微博爬虫 实现微博评论爬取和微博用户信息爬取,一天大概十万条。 2.LDA主题分析 实现文档主题抽取,包括数据清洗及分词、主题数的确定(主题一致性和困惑度)和最优主题模型的选择(暴力搜索)。 3.情感分析 实现评论文本的

182 Jan 02, 2023
Make your master artistic punk avatar through machine learning world famous paintings.

Master-art-punk Make your master artistic punk avatar through machine learning world famous paintings. 通过机器学习世界名画制作属于你的大师级艺术朋克头像 Nowadays, NFT is beco

Philipjhc 53 Dec 27, 2022
PFLD pytorch Implementation

PFLD-pytorch Implementation of PFLD A Practical Facial Landmark Detector by pytorch. 1. install requirements pip3 install -r requirements.txt 2. Datas

zhaozhichao 669 Jan 02, 2023
The repo of Feedback Networks, CVPR17

Feedback Networks http://feedbacknet.stanford.edu/ Paper: Feedback Networks, CVPR 2017. Amir R. Zamir*,Te-Lin Wu*, Lin Sun, William B. Shen, Bertram E

Stanford Vision and Learning Lab 87 Nov 19, 2022
Re-implememtation of MAE (Masked Autoencoders Are Scalable Vision Learners) using PyTorch.

mae-repo PyTorch re-implememtation of "masked autoencoders are scalable vision learners". In this repo, it heavily borrows codes from codebase https:/

Peng Qiao 1 Dec 14, 2021
Code for "Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo"

Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo This repository includes the source code for our CVPR 2021 paper on multi-view mult

Jiahao Lin 66 Jan 04, 2023
Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit

streamlit-manim Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit Installation I had to install pango with sudo apt-get

Adrien Treuille 6 Aug 03, 2022
A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

A-ESRGAN: Training Real-World Blind Super-Resolution with Attention-based U-net Discriminators The authors are hidden for the purpose of double blind

77 Dec 16, 2022
MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022