Image captioning

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Model is seq2seq model. In the encoder pretrained EfficientNet-b3 model is used to extract the features. Decoder is the LSTM with the Bahdanau Attention.

Dataset

The dataset is available at kaggle and contains 8,000 images that are each paired with five different captions.

Usage

run in terminal: python -m img_caption

Config

The user interface consists of file:

config.yaml - general configuration with data and model parameters

Default config.yaml:

data:
  path_to_data_folder: "data"
  caption_file_name: "captions.txt"
  images_folder_name: "Images"
  output_folder_name: "output"
  logging_file_name: "logging.txt"
  model_file_name: "model.pt"

batch_size: 32
num_worker: 1
gensim_model_name: "glove-wiki-gigaword-200"

model:
  embedding_dimension: 200
  decoder_hidden_dimension: 300
  learning_rate: 0.0001
  momentum: 0.9
  n_epochs: 50
  clip: 5
  fine_tune_encoder: false

Output

After training the model, the pipeline will return the following files:

model.pt - checkpoint with:
- epoch - last epoch
- model_state_dict - model parameters
- optimizer_state_dict - the state of the optimizer
- train_history - training history from a model
- valid_history - validation history from a model
- best_valid_loss - the best validation loss

End-to-end image captioning with EfficientNet-b3 + LSTM with Attention

Related tags

Overview

Image captioning

Dataset

Usage

Config

Output

Owner

CCF BDCI 2020 房产行业聊天问答匹配赛道 A榜47/2985

Blackstone is a spaCy model and library for processing long-form, unstructured legal text

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Shellcode antivirus evasion framework

novel deep learning research works with PaddlePaddle

A Plover python dictionary allowing for consistent symbol input with specification of attachment and capitalisation in one stroke.

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

SNCSE: Contrastive Learning for Unsupervised Sentence Embedding with Soft Negative Samples

Implementation of TF-IDF algorithm to find documents similarity with cosine similarity

Implementation of "Adversarial purification with Score-based generative models", ICML 2021

Syntax-aware Multi-spans Generation for Reading Comprehension (TASLP 2022)

ADCS cert template modification and ACL enumeration

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Signature remover is a NLP based solution which removes email signatures from the rest of the text.

Pipelines de datos, 2021.

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Code for the paper "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"

SDL: Synthetic Document Layout dataset

Phrase-Based & Neural Unsupervised Machine Translation