Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Last update: Apr 06, 2022

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

========================================================================

Author: Jonathan Kuo
Python: 3.6.1
TensorFlow: 1.0.1 Keras: 2.0.4

Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Introduction

The Keras deep learning architecture of this project was inspired by Deep Visual-Semantic Alignments for Generating Image Descriptions by Andrej Karpathy and Fei-Fei Li.

Given input of a dataset of images and their sentence descriptions, define a Keras (TensorFlow backend) deep learning model that corresponds detected regions on image with description segments. This learning allows the model to output novel descriptions for test images.

Dataset

Microsoft Common Objects in Context (MSCOCO) is an image recognition, segmentation, and captioning dataset. Training data includes 123,000 images and caption pairs. Validation and testing data are both 5,000 images and caption pairs.

Architecture

VGG16 CNN architecture (loaded in Keras) with pre-trained weights on ImageNet are used as the CNN to detect objects in the image. Then, the last dense softmax 200-classification layer was removed in order to pass the 4096-D activations into into the RNN (LSTM). CNN weights are frozen and RNN weights are updated in backpropagation through time (BPTT). The CNN and LSTM is merged before passing into a second LSTM to predict the next word in the sequence. RMSprop is used as the optimizer to combat the vanishing gradient problem.

Demo

View the demo iPython notebook for the model training and prediction on the MSCOCO dataset.

Deep-Learning-Image-Captioning - Implementing convolutional and recurrent neural networks in Keras to generate sentence descriptions of images

Related tags

Overview

Deep Learning - Image Captioning with Convolutional and Recurrent Neural Nets

Introduction

Dataset

Architecture

Demo

Owner

The source code for Adaptive Kernel Graph Neural Network at AAAI2022

Official code for 'Pixel-wise Energy-biased Abstention Learning for Anomaly Segmentationon Complex Urban Driving Scenes'

AlgoVision - A Framework for Differentiable Algorithms and Algorithmic Supervision

Puzzle-CAM: Improved localization via matching partial and full features.

CAMPARI: Camera-Aware Decomposed Generative Neural Radiance Fields

Koç University deep learning framework.

MARE - Multi-Attribute Relation Extraction

PyTorch implementation of MLP-Mixer

Object tracking using YOLO and a tracker(KCF, MOSSE, CSRT) in openCV

Applying curriculum to meta-learning for few shot classification

Repo for my Tensorflow/Keras CV experiments. Mostly revolving around the Danbooru20xx dataset

Code for paper Adaptively Aligned Image Captioning via Adaptive Attention Time

Code for the paper: Adversarial Training Against Location-Optimized Adversarial Patches. ECCV-W 2020.

Protect against subdomain takeover

TResNet: High Performance GPU-Dedicated Architecture

An implementation of "MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing" (ICML 2019).

CenterFace(size of 7.3MB) is a practical anchor-free face detection and alignment method for edge devices.

CSPML (crystal structure prediction with machine learning-based element substitution)

Official PyTorch code for "BAM: Bottleneck Attention Module (BMVC2018)" and "CBAM: Convolutional Block Attention Module (ECCV2018)"

Self-Supervised Collision Handling via Generative 3D Garment Models for Virtual Try-On