基于openpose和图像分类的手语识别项目

Overview

手语识别


0、使用到的模型

(1). openpose,作者:CMU-Perceptual-Computing-Lab

https://github.com/CMU-Perceptual-Computing-Lab/openpose

(2). 图像分类classification,作者:Bubbliiiing

https://github.com/bubbliiiing/classification-pytorch

B站对应视频:https://www.bilibili.com/video/BV143411B7wg

(3). 手语教学视频,作者:二碳碳

https://www.bilibili.com/video/BV1XE41137LV

(感谢大佬们的开源项目和教程,都已star加三连)




1、大致思路

方法一: 将视频输入到openpose中,检测出关节点的变化轨迹,将轨迹绘制在一张图片上,把这张图片传到图像分类网络中检测属于哪个动作

视频  ----->  |  openpose  |-----> 关节点运动轨迹图-------> |  图像分类模型  | ----------> 单词分类  

方法二: 将视频输入到openpose中,检测出每一帧中关节点的位置,将多帧进行堆叠,形成一个三维张量,其中两个维度是图片的宽和高,一个维度是时间,然后对这个三维张量使用三维卷积进行训练和预测

视频  ----->  |  openpose  |-----> 多张关节点位置图 ---------> |  堆叠  | --------> 三维张量 -------> |  三维卷积网络  | ----------> 单词分类  



2、环境配置

python:3.7(其他版本会导致openpose无法运行,建议使用anaconda的python环境)
cuda:10
cudnn:7或8应该都行
(配置cuda和cudnn会比较麻烦,如果实在不想配,你可以去openpose的github网站下载使用cpu的版本,这里这个版本应该不支持cpu)


具体的配置环境方式:

(0).python和cuda和cudnn自己装


(1).下载文件

下载代码文件后,再从网盘下载模型和数据文件(没有这些跑不起来),网盘链接:

链接:https://pan.baidu.com/s/1Q2aVVhMhSfWL4qKS9QslkQ 
提取码:abcd 

将从github下载的文件夹和网盘下载的文件夹合并,然后就可以下一步了。 (当然你大可直接找我要u盘拿完整的文件)


(2).安装requirements.txt中的库

cmd进入环境后,cd到项目文件夹下,执行指令:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple

(3).安装torch和torchvision

先下载好torch(1.2.0)和torchvision(0.4.0)的whl文件,下载地址:

链接:https://pan.baidu.com/s/1QIuJfEE5qQFpXY8ZlHeLNQ 
提取码:abcd 

(当然你依旧可直接找我拿u盘)

下载好torch和torchvision的whl文件后,cmd进入环境,cd到下载文件夹下,执行指令:

pip install [torch或torchvision的whl文件的文件名]

(先装torch再装torchvision,不然有可能会报错)



3、测试运行openpose

项目文件夹下有三个文件:

test.py
test_video.py
test_video_track_point.py

分别对应openpose的功能:检测图片、检测视频、检测视频并绘制关节点轨迹
具体的使用方法可以看文件中的注释部分


在test_video_track_point.py中,取消掉最后几行的注释,就可以将绘制的轨迹图送到classification中去做分类检测
(不过现阶段分类器尚未做好)




4、classification的训练和使用

可以看下classification文件夹中的README.md文件,大佬已经在里边讲得很详细了

This is the code for our paper DAAIN: Detection of Anomalous and AdversarialInput using Normalizing Flows

Merantix-Labs: DAAIN This is the code for our paper DAAIN: Detection of Anomalous and Adversarial Input using Normalizing Flows which can be found at

Merantix 14 Oct 12, 2022
Unofficial implementation of "TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from Scanned Document Images"

TableNet Unofficial implementation of ICDAR 2019 paper : TableNet: Deep Learning model for end-to-end Table detection and Tabular data extraction from

Jainam Shah 243 Dec 30, 2022
Python rubik's cube solver

This program makes a 3D representation of a rubiks cube and solves it step by step.

Pablo QB 4 May 29, 2022
Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)

ocr-fileformat Validate and transform between OCR file formats (hOCR, ALTO, PAGE, FineReader) Installation Docker System-wide Usage CLI GUI API Transf

Universitätsbibliothek Mannheim 152 Dec 20, 2022
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

174 Dec 31, 2022
Discord QR Scam Code Generator + Token grab mobile device.

A Python script that automatically generates a Nitro scam QR code and grabs the Discord token when scanned.

Visual 9 Nov 22, 2022
Read Japanese manga inside browser with selectable text.

mokuro Read Japanese manga with selectable text inside a browser. See demo: https://kha-white.github.io/manga-demo mokuro_demo.mp4 Demo contains excer

Maciej Budyś 170 Dec 27, 2022
make a better chinese character recognition OCR than tesseract

deep ocr See README_en.md for English installation documentation. 只在ubuntu下面测试通过,需要virtualenv安装,安装路径可自行调整: git clone https://github.com/JinpengLI/deep

Jinpeng 1.5k Dec 28, 2022
Amazing 3D explosion animation using Pygame module.

3D Explosion Animation 💣 💥 🔥 Amazing explosion animation with Pygame. 💣 Explosion physics An Explosion instance is made of a set of Particle objec

Dylan Tintenfich 12 Mar 11, 2022
Primary QPDF source code and documentation

QPDF QPDF is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption,

QPDF 2.2k Jan 04, 2023
The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

Sight Machine 2.6k Dec 31, 2022
MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF

MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF Python class for converting (very fast) 3D Meshes/Surfaces to Raster DEMs

8 Sep 10, 2022
QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

Application-Oriented Performance Benchmarks for Quantum Computing This repository contains a collection of prototypical application- or algorithm-cent

SRI International 67 Nov 30, 2022
computer vision, image processing and machine learning on the web browser or node.

Image processing and Machine learning labs   computer vision, image processing and machine learning on the web browser or node note Fast Fourier Trans

ryohei tanaka 487 Nov 11, 2022
An advanced 2D image manipulation with features such as edge detection and image segmentation built using OpenCV

OpenCV-ToothPaint3-Advanced-Digital-Image-Editor This application named ‘Tooth Paint’ version TP_2020.3 (64-bit) or version 3 was developed within a w

JunHong 1 Nov 05, 2021
LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

LEARN OPENCV IN 3 HOURS USING PYTHON - INCLUDING EXAMPLE PROJECTS

Murtaza Hassan 815 Dec 29, 2022
Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform sign language recognition.

Sign Language Recognition Service This is a Sign Language Recognition service utilizing a deep learning model with Long Short-Term Memory to perform s

Martin Lønne 1 Jan 08, 2022
Regions sanitàries (RS), Sectors Sanitàris (SS) i Àrees Bàsiques de Salut (ABS) de Catalunya

Regions sanitàries (RS), Sectors Sanitaris (SS), Àrees de Gestió Assistencial (AGA) i Àrees Bàsiques de Salut (ABS) de Catalunya Fitxers GeoJSON de le

Glòria Macià Muñoz 2 Jan 23, 2022
TensorFlow Implementation of FOTS, Fast Oriented Text Spotting with a Unified Network.

FOTS: Fast Oriented Text Spotting with a Unified Network I am still working on this repo. updates and detailed instructions are coming soon! Table of

Masao Taketani 52 Nov 11, 2022
Smart computer vision application

Smart-computer-vision-application Backend : opencv and python Library required:

2 Jan 31, 2022