The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

Last update: Dec 04, 2022

Related tags

Overview

Feedback Convolutional Neural Network for Visual Localization and Segmentation

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation. The code is written in PyTorch, very simple to understand.

There is also a Caffe implementation, please check it if you use Caffe and Matlab.

Requirement:

Python 3
Pytorch 0.4.0

How to run:

open the ipython notebooks with jupyter notebook

then open vgg_fr.ipynb or vgg_fsp.ipynb, these are the two main files for demonstrate feedback idea.

How it looks:

If you run vgg_fsp.ipynb without modification of code, you are supposed to see below visualization:

Input image:

Image gradient with respect to the target label:

Image gradient with respect to the target label after 4 iterations of feedback selective pruning (FSP):

Files explanation:

vgg_fr.ipynb: the main file that defines the vgg feedback network with the feedback recovering mechanism and run a feedback visualization on examplar images.
vgg_fsp.ipynb: the main file that defines the vgg feedback network with the feedback selective pruning mechanism and run a feedback visualization on examplar images.
images: storing exmaplar images
imagenet1000_clsid_to_human.txt: storing image net 1000 class names, for visualization and understanding purpose
test/simple_test.ipynb: unit test for a simple feedback network, using a simple fully connected structure
test/vgg_test.ipynb: unit test for the loading of a pretrained vgg network, then check the weights copying from pretrained network to a new defined network interface

Citation

Please consider citing in your publications if it helps your research:

@inproceedings{cao2015look,
  title={Look and think twice: Capturing top-down visual attention with feedback convolutional neural networks},
  author={Cao, Chunshui and Liu, Xianming and Yang, Yi and Yu, Yinan and Wang, Jiang and Wang, Zilei and Huang, Yongzhen and Wang, Liang and Huang, Chang and Xu, Wei and others},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision},
  pages={2956--2964},
  year={2015}
}

The code is an implementation of Feedback Convolutional Neural Network for Visual Localization and Segmentation.

Related tags

Overview

Feedback Convolutional Neural Network for Visual Localization and Segmentation

Requirement:

How to run:

How it looks:

Files explanation:

Citation

Owner

An NVDA add-on to split screen reader and audio from other programs to different sound channels

The Official Repository for "Generalized OOD Detection: A Survey"

Official Implementation of "LUNAR: Unifying Local Outlier Detection Methods via Graph Neural Networks"

A human-readable PyTorch implementation of "Self-attention Does Not Need O(n^2) Memory"

Machine Learning Models were applied to predict the mass of the brain based on gender, age ranges, and head size.

Official implementation for paper: A Latent Transformer for Disentangled Face Editing in Images and Videos.

Out-of-Domain Human Mesh Reconstruction via Dynamic Bilevel Online Adaptation

A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Crosslingual Segmental Language Model

The code release of paper Low-Light Image Enhancement with Normalizing Flow

Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

Revisiting Temporal Alignment for Video Restoration

TDN: Temporal Difference Networks for Efficient Action Recognition

Evaluation and Benchmarking of Speech Super-resolution Methods

Learning Representational Invariances for Data-Efficient Action Recognition

deep_image_prior_extension

This repository focus on Image Captioning & Video Captioning & Seq-to-Seq Learning & NLP

An open source machine learning library for performing regression tasks using RVM technique.

This is a Deep Leaning API for classifying emotions from human face and human audios.

ESL: Event-based Structured Light