To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Last update: Feb 08, 2022

Related tags

Text Data & NLP Eye_for_the_blind

Overview

Eye for the blind

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset. This kind of model is a use-case for blind people so that they can understand any image with the help of speech. The caption generated through a CNN-RNN model will be converted to speech using a text to speech library.

This problem statement is an application of both deep learning and natural language processing. The features of an image will be extracted by CNN-based encoder and this will be decoded by an RNN model.

The project is an extended application of Show, Attend and Tell: Neural Image Caption Generation with Visual Attention paper. https://arxiv.org/abs/1502.03044

The dataset is taken from the Kaggle website and it consists of sentence-based image description having a list of 8,000 images that are each paired with five different captions which provide clear descriptions of the salient entities and events of the image.

Project Pipeline

The project pipeline can be briefly summarized in the following four steps:

Data Understanding: Here, you need to load the data and understand the representation.
Data preprocessing: In this step, you will process both images and captions to the desired format.
Train/Test Split: Combine both images and captions to create the train and test dataset.
Model-Building: This is the stage where you will create your image captioning model by building Encoder , Attention and Decoder model.
Model Evaluation: Evaluate the models using greedy search and BLEU score.

To create a deep learning model which can explain the content of an image in the form of speech through caption generation with attention mechanism on Flickr8K dataset.

Related tags

Overview

Eye for the blind

Project Pipeline

Owner

Ragesh Hajela

Python package to easily retrain OpenAI's GPT-2 text-generating model on new texts

Chinese named entity recognization (bert/roberta/macbert/bert_wwm with Keras)

Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

华为商城抢购手机的Python脚本 Python script of Huawei Store snapping up mobile phones

FedNLP: A Benchmarking Framework for Federated Learning in Natural Language Processing

Problem: Given a nepali news find the category of the news

Summarization, translation, sentiment-analysis, text-generation and more at blazing speed using a T5 version implemented in ONNX.

Smart discord chatbot integrated with Dialogflow

Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech (BVAE-TTS)

This repository contains the code, data, and models of the paper titled "CrossSum: Beyond English-Centric Cross-Lingual Abstractive Text Summarization for 1500+ Language Pairs".

Auto-researching tool generating word documents.

ProtFeat is protein feature extraction tool that utilizes POSSUM and iFeature.

Recognition of 38 speech commands in russian. Based on Yandex Cup 2021 ML Challenge: ASR

Natural Language Processing

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

SpeechBrain is an open-source and all-in-one speech toolkit based on PyTorch.

A Japanese tokenizer based on recurrent neural networks

Finding Label and Model Errors in Perception Data With Learned Observation Assertions

Text editor on python to convert english text to malayalam(Romanization/Transiteration).

MicBot - MicBot uses Google Translate to speak everyone's chat messages