SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

Overview

SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition

PDF

Figure

Abstract

Explainable artificial intelligence has been gaining attention in the past few years. However, most existing methods are based on gradients or intermediate features, which are not directly involved in the decision-making process of the classifier. In this paper, we propose a slot attention-based classifier called SCOUTER for transparent yet accurate classification. Two major differences from other attention-based methods include: (a) SCOUTER's explanation is involved in the final confidence for each category, offering more intuitive interpretation, and (b) all the categories have their corresponding positive or negative explanation, which tells "why the image is of a certain category" or "why the image is not of a certain category." We design a new loss tailored for SCOUTER that controls the model's behavior to switch between positive and negative explanations, as well as the size of explanatory regions. Experimental results show that SCOUTER can give better visual explanations while keeping good accuracy on small and medium-sized datasets.

Model Structure

Structure Figure

SCOUTER is built on top of the recently-emerged slot attention, which offers an object-centric approach for image representation. Based on this approach, we propose an explainable slot attention (xSlot) module. The output from the xSlot module is directly used as the confidence values for each category and thus commonly used fully-connected (FC) layer-based classifiers are no longer necessary. The whole network, including the backbone, is trained with the SCOUTER loss, which provides control over the size of explanatory regions and switching between positive and negative explanations.

Usage

Enable distributed training (if desired)
python -m torch.distributed.launch --nproc_per_node=4 --use_env train.py --world_size 4

Imagenet

Training for Imagenet dataset (Base Model)
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot false \
--vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Positive Scouter for Imagenet dataset
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status 1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Negative Scouter for Imagenet dataset
python train.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis false --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization of Positive Scouter for Imagenet dataset
python test.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status 1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis true --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization of Negative Scouter for Imagenet dataset
python test.py --dataset ImageNet --model resnest26d --batch_size 70 --epochs 20 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 1 \
--power 2 --to_k_layer 3 --lambda_value 1 --vis true --channel 2048 --freeze_layers 0 \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/
Visualization using torchcam for Imagenet dataset
python torchcam_vis.py --dataset ImageNet --model resnest26d --batch_size 70 \
--num_classes 10 --grad true --use_pre true \
--dataset_dir ../data/imagenet/ILSVRC/Data/CLS-LOC/ \
--grad_min_level 0

MNIST Dataset

Pre-training for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot false --vis false --aug false
Positive Scouter for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre true --loss_status 1 --slots_per_class 1 \
--power 1 --to_k_layer 1 --lambda_value 1. --vis false --channel 512 --aug false
Negative Scouter for MNIST dataset
python train.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 2 \
--power 2 --to_k_layer 1 --lambda_value 1.5 --vis false --channel 512 --aug false --freeze_layers 3
Visualization of Positive Scouter for MNIST dataset
python test.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre true --loss_status 1 --slots_per_class 1 \
--power 1 --to_k_layer 1 --lambda_value 1. --vis true --channel 512 --aug false
Visualization of Negative Scouter for MNIST dataset
python test.py --dataset MNIST --model resnet18 --batch_size 64 --epochs 10 \
--num_classes 10 --use_slot true --use_pre false --loss_status -1 --slots_per_class 2 \
--power 2 --to_k_layer 1 --lambda_value 1.5 --vis true --channel 512 --aug false --freeze_layers 3
Visualization using torchcam for MNIST dataset
python torchcam_vis.py --dataset MNIST --model resnet18 --batch_size 64 \
--num_classes 10 --grad true --use_pre true

Con-Text Dataset

Pre-training for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot false --vis false \
--dataset_dir ../data/con-text/JPEGImages/
Positive Scouter for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status 1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value .2 --vis false --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Negative Scouter for ConText dataset
python train.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis false --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization of Positive Scouter for ConText dataset
python test.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status 1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization of Negative Scouter for ConText dataset
python test.py --dataset ConText --model resnest26d --batch_size 200 --epochs 100 \
--num_classes 30 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 \
--dataset_dir ../data/con-text/JPEGImages/
Visualization using torchcam for ConText dataset
python torchcam_vis.py --dataset ConText --model resnest26d --batch_size 200 \
--num_classes 30 --grad true --use_pre true \
--dataset_dir ../data/con-text/JPEGImages/

CUB-200 Dataset

Pre-training for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot false --vis false --channel 2048 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Positive Scouter for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status 1 --slots_per_class 5 \
--power 2 --to_k_layer 3 --lambda_value 10 --vis false --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Negative Scouter for CUB-200 dataset
python train.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis false --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization of Positive Scouter for CUB-200 dataset
python test.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status 1 --slots_per_class 5 \
--power 2 --to_k_layer 3 --lambda_value 10 --vis true --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization of Negative Scouter for CUB-200 dataset
python test.py --dataset CUB200 --model resnest50d --batch_size 64 --epochs 150 \
--num_classes 25 --use_slot true --use_pre true --loss_status -1 --slots_per_class 3 \
--power 2 --to_k_layer 3 --lambda_value 1. --vis true --channel 2048 --freeze_layers 2 \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/
Visualization using torchcam for CUB-200 dataset
python torchcam_vis.py --dataset CUB200 --model resnest50d --batch_size 150 \
--num_classes 25 --grad true --use_pre true \
--dataset_dir ../data/bird_200/CUB_200_2011/CUB_200_2011/

Acknowledgements

This work was supported by Council for Science, Technology and Innovation (CSTI), cross-ministerial Strategic Innovation Promotion Program (SIP), "Innovative AI Hospital System" (Funding Agency: National Institute of Biomedical Innovation, Health and Nutrition (NIBIOHN)).

Publication

If you want to use this work, please consider citing the following paper.

@inproceedings{li2021scouter,
 author = {Liangzhi Li and Bowen Wang and Manisha Verma and Yuta Nakashima and Ryo Kawasaki and Hajime Nagahara},
 booktitle = {IEEE International Conference on Computer Vision (ICCV)},
 pages = {},
 title = {SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition},
 year = {2021}
}
Owner
Bowen Wang
Write-ups for the SwissHackingChallenge2021 CTF.

SwissHackingChallenge 2021 : Write-ups This repository contains a collection of my write-ups for challenges solved during the SwissHackingChallenge (S

Julien Béguin 3 Jun 07, 2021
Face Recognizer using Opencv Python

Face Recognizer using Opencv Python The first step create your own dataset with file open-cv-create_dataset second step You can put the photo accordin

Han Izza 2 Nov 16, 2021
Augmenting Anchors by the Detector Itself

Augmenting Anchors by the Detector Itself Introduction It is difficult to determine the scale and aspect ratio of anchors for anchor-based object dete

4 Nov 06, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
Automatically download multiple papers by keywords in CVPR

CVFPaperHelper Automatically download multiple papers by keywords in CVPR Install mkdir PapersToRead cd PaperToRead pip install requests tqdm git clon

46 Jun 08, 2022
Usando o Amazon Textract como OCR para Extração de Dados no DynamoDB

dio-live-textract2 Repositório de código para o live coding do dia 05/10/2021 sobre extração de dados estruturados e gravação em banco de dados a part

hugoportela 0 Jan 19, 2022
Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract

Responsive Doc. scanner using U^2-Net, Textcleaner and Tesseract Toolset U^2-Net is used for background removal Textcleaner is used for image cleaning

3 Jul 13, 2022
Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Virtualdragdrop - Virtual Drag and Drop Using OpenCV and Arduino

Rizky Dermawan 4 Mar 10, 2022
Histogram specification using openCV in python .

histogram specification using openCV in python . Have to input miu and sigma to draw gausssian distribution which will be used to map the input image . Example input can be miu = 128 sigma = 30

Tamzid hasan 6 Nov 17, 2021
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022
Single Shot Text Detector with Regional Attention

Single Shot Text Detector with Regional Attention Introduction SSTD is initially described in our ICCV 2017 spotlight paper. A third-party implementat

Pan He 215 Dec 07, 2022
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
Um RPG de texto orientado a objetos.

RPG de texto Um RPG de texto orientado a objetos, sem história. Um RPG (Role-playing game) baseado em texto em que você pode viajar para alguns locais

Vinicius 3 Oct 05, 2022
Dataset and Code for ICCV 2021 paper "Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme"

Dataset and Code for RealVSR Real-world Video Super-resolution: A Benchmark Dataset and A Decomposition based Learning Scheme Xi Yang, Wangmeng Xiang,

Xi Yang 91 Nov 22, 2022
A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV.

DcoumentScanner A document scanner application for laptops/desktops developed using python, Tkinter and OpenCV. Directly install the .exe file to inst

Harsh Vardhan Singh 1 Oct 29, 2021
FOTS Pytorch Implementation

News!!! Recognition branch now is added into model. The whole project has beed optimized and refactored. ICDAR Dataset SynthText 800K Dataset detectio

Ning Lu 599 Dec 19, 2022
Python Computer Vision application that allows users to draw/erase on the screen using their webcam.

CV-Virtual-WhiteBoard The Virtual WhiteBoard is a project I made using the OpenCV and Mediapipe Python libraries. Using your index and middle finger y

Stephen Wang 1 Jan 07, 2022
Zoom , GoogleMeets에서 Vtuber 데뷔하기

EasyVtuber Facial landmark와 GAN을 이용한 Character Face Generation Google Meets, Zoom 등에서 자신만의 웹툰, 만화 캐릭터로 대화해보세요! 악세사리는 어느정도 추가해도 잘 작동해요! 안타깝게도 RTX 2070

Gunwoo Han 140 Dec 23, 2022
This repository summarized computer vision theories.

This repository summarized computer vision theories.

3 Feb 04, 2022