GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Last update: Nov 24, 2021

Related tags

Overview

Guidedog

Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled. You may as well think of it as "speaking guide dog," as the name suggests. It has three key features based on the scene captured by your mobile phone:

Reads text upon command
Describes the scene around you upon command
Warns you if there is an obstacle in front of you

Check out this demo video to learn more about our app!

Android App

UI/UX
- Simple and Responsive
- Voice Assistant architecture for targeted audience
Libraries / APIs
- GC Speech-to-text and Text-to-Speech
- Android SDK , androidX
- ML Kit object detection and tracking api
- TensorFlow Lite MobileNet Image Classification Model

Backend

Flask API
- Image Captioning
- Optical Character Recognition
Deployment
- Google App Engine
- fast central API with different endpoints

Image Captioning

We used tensorflow to build and train model for image captioning on MS-COCO 2014 based on the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. The model uses standard convolutional network as an encoder to extract features from images (we use Inception V3) and feed the generated features into an attention-based decoder generate sentences. While the paper used LSTM model as a decoder, we use a simpler RNN instead.

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Related tags

Overview

Guidedog

Android App

Backend

Image Captioning

Get more insights : Devpost

Owner

Kyuhee Jo

3D dataset of humans Manipulating Objects in-the-Wild (MOW)

Standalone pre-training recipe with JAX+Flax

Network Enhancement implementation in pytorch

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Real-time Joint Semantic Reasoning for Autonomous Driving

Analysis code and Latex source of the manuscript describing the conditional permutation test of confounding bias in predictive modelling.

Code of the paper "Part Detector Discovery in Deep Convolutional Neural Networks" by Marcel Simon, Erik Rodner and Joachim Denzler

Python scripts for performing lane detection using the LSTR model in ONNX

Toward Multimodal Image-to-Image Translation

PyExplainer: A Local Rule-Based Model-Agnostic Technique (Explainable AI)

😮The official implementation of "CoNeRF: Controllable Neural Radiance Fields" 😮

Autolfads-tf2 - A TensorFlow 2.0 implementation of Latent Factor Analysis via Dynamical Systems (LFADS) and AutoLFADS

Predict multi paths to a moving person depending on his trajectory history.

pytorch implementation of trDesign

Vector Quantization, in Pytorch

Code for the ECCV2020 paper "A Differentiable Recurrent Surface for Asynchronous Event-Based Data"

SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

Code for the paper "Jukebox: A Generative Model for Music"

Semantic similarity computation with different state-of-the-art metrics

Implementation of the paper titled "Using Sampling to Estimate and Improve Performance of Automated Scoring Systems with Guarantees"