使用深度学习框架提取视频硬字幕;docker容器免安装深度学习库,使用本地api接口使得界面和后端识别分离;

Overview

extract-video-subtittle

使用深度学习框架提取视频硬字幕;

本地识别无需联网;

CPU识别速度可观;

容器提供API接口;

运行环境

本项目运行环境非常好搭建,我做好了docker容器免安装各种深度学习包;

提供windows界面操作;

容器为CPU版本;

视频演示

https://www.bilibili.com/video/BV18Q4y1f774/

程序说明

1、先启动后端容器实例

docker run -d -p 6666:6666 m986883511/extract_subtitles

image-20210801214757813

2、启动程序

简单介绍页面

1:点击左边按钮连接第一步启动的容器;

2:视频提取字幕的总进度

3:当前视频帧显示的位置,就是视频进度条

4:识别出来的文字会在这里显示一下

image-20210801215010179

image-20210801215258761

3、点击选择视频确认字幕位置

点击选择视频按钮,这时你可以拖动进度条到有字幕的位置;然后点击选择字幕区域;在视频中画一个矩形;

image-20210801215258761

4、点击测试连接API

image-20210801220206554

后端没问题的话,会显示已连通;此时所有步骤准备就绪

5、开始识别

点击请先完成前几步按钮,内部分为这几个步骤

  1. 本地通过ffmpeg提取视频声音保存到temp目录(0%-10%)
  2. api通信将声音文件发送到容器内,容器内spleeter库提取声音中人声,结果保存在容器内temp目录,很耗时间,吃CPU和内存(10%-30)
  3. api通信,将人声根据停顿分片,返回分片结果,耗较短的时间(30%-40%)
  4. 根据说话分片时间开始识别字幕(40-%100%)

当100%的时候查看temp目录就生成了和视频同名的srt字幕文件

运行后台

后端接口容器地址Docker Hub

此过程可能时间较长,您需要预先安装好好docker,并配置好docker加速器,你可能需要先docker login

docker run -d -p 6666:6666 m986883511/extract_subtitles

本项目缺少文件

因网速墙的问题,大文件推送不上去,可以参考.gitignore中写的

其他

视频提取

# 视频片段提取
ffmpeg -ss 00:15:45 -t 00:02:15 -i test/three_body_3_7.mp4 -vcodec copy -acodec copy test/3body.mp4
# 打包界面程序
C:/Python/Python38-32/Scripts/pyinstaller.exe main.spec

参考资料

本项目中深度学习源代码为/docker/backend

原作者为:https://github.com/YaoFANGUK/video-subtitle-extractor

You might also like...
Comments
  • 提取人声一直没结果

    提取人声一直没结果

    image 视频是40多分钟的连续剧。CPU版本。之前用YaoFANGUK/video-subtitle-extractor提取字幕很成功也准确,但时间比较长。看到作者用音频分析减少了识别的帧数,所以试了一下。但在提取人声时,已经等待了近50分钟没有结果。而且CPU的占用只有1%左右,这明显不正常。用YaoFANGUK/video-subtitle-extractor整个的耗时可能都没有这么久。另外autosub也是提取音频来语音识别字幕,识别人声也很快,同样的视频几分钟就完了。麻烦作者看看是出了什么问题呢。

    opened by royzengyi 2
  • 项目咨询

    项目咨询

    Hello,我尝试了一下这个软件,感觉还是不错的,不过在实际使用中还是会有不少问题。

    我是一个独立开发者,这边愿意付费或者合作来完善一下,让这个项目更具实用性,不知道你有没有兴趣呢?

    没有找到联系方式,只好通过issue来试一下,你可以在看到之后删除,谢谢。

    我的邮箱是yedaxia#foxmail.com

    opened by YeDaxia 1
Releases(0.2.0)
Owner
歌者
失去人性,失去很多;失去兽性,失去一切;活着才能燃烧自己。
歌者
Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images"

GANInversion_with_ConsecutiveImgs Official code for our ICCV paper: "From Continuity to Editability: Inverting GANs with Consecutive Images" https://a

QingyangXu 38 Dec 07, 2022
A unified framework to jointly model images, text, and human attention traces.

connect-caption-and-trace This repository contains the reference code for our paper Connecting What to Say With Where to Look by Modeling Human Attent

Meta Research 73 Oct 24, 2022
Datasets, Transforms and Models specific to Computer Vision

vision Datasets, Transforms and Models specific to Computer Vision Installation First install the nightly version of OneFlow python3 -m pip install on

OneFlow 68 Dec 07, 2022
A working implementation of the Categorical DQN (Distributional RL).

Categorical DQN. Implementation of the Categorical DQN as described in A distributional Perspective on Reinforcement Learning. Thanks to @tudor-berari

Florin Gogianu 98 Sep 20, 2022
Convex optimization for fun and profit.

CFMM Optimal Routing This repository contains the code needed to generate the figures used in the paper Optimal Routing for Constant Function Market M

Guillermo Angeris 183 Dec 29, 2022
Learning View Priors for Single-view 3D Reconstruction (CVPR 2019)

Learning View Priors for Single-view 3D Reconstruction (CVPR 2019) This is code for a paper Learning View Priors for Single-view 3D Reconstruction by

Hiroharu Kato 38 Aug 17, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree.

self-driving-car In this repository I will share the source code of all the projects of Udacity Self-Driving Car Engineer Nanodegree. Hope this might

Andrea Palazzi 2.4k Dec 29, 2022
Official PyTorch implementation of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image", ICCV 2019

PoseNet of "Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image" Introduction This repo is official Py

Gyeongsik Moon 677 Dec 25, 2022
Code for "MetaMorph: Learning Universal Controllers with Transformers", Gupta et al, ICLR 2022

MetaMorph: Learning Universal Controllers with Transformers This is the code for the paper MetaMorph: Learning Universal Controllers with Transformers

Agrim Gupta 50 Jan 03, 2023
Caffe implementation for Hu et al. Segmentation for Natural Language Expressions

Segmentation from Natural Language Expressions This repository contains the Caffe reimplementation of the following paper: R. Hu, M. Rohrbach, T. Darr

10 Jul 27, 2021
The official PyTorch implementation of Curriculum by Smoothing (NeurIPS 2020, Spotlight).

Curriculum by Smoothing (NeurIPS 2020) The official PyTorch implementation of Curriculum by Smoothing (NeurIPS 2020, Spotlight). For any questions reg

PAIR Lab 36 Nov 23, 2022
A Closer Look at Invalid Action Masking in Policy Gradient Algorithms

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms This repo contains the source code to reproduce the results in the paper A Close

Costa Huang 73 Dec 24, 2022
MASS (Mueen's Algorithm for Similarity Search) - a python 2 and 3 compatible library used for searching time series sub-sequences under z-normalized Euclidean distance for similarity.

Introduction MASS allows you to search a time series for a subquery resulting in an array of distances. These array of distances enable you to identif

Matrix Profile Foundation 79 Dec 31, 2022
Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

BaseSAFE This repository contains the BaseSAFE Rust APIs, introduced by "BaseSAFE: Baseband SAnitized Fuzzing through Emulation". The example/ directo

Security in Telecommunications 138 Dec 16, 2022
(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing This repository contains pytorch source code for AAAI2020 oral paper: Grapy-ML

54 Aug 04, 2022
Robot Servers and Server Manager software for robo-gym

robo-gym-server-modules Robot Servers and Server Manager software for robo-gym. For info on how to use this package please visit the robo-gym website

JR ROBOTICS 4 Aug 16, 2021
Official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo'

IterMVS official source code of paper 'IterMVS: Iterative Probability Estimation for Efficient Multi-View Stereo' Introduction IterMVS is a novel lear

Fangjinhua Wang 127 Jan 04, 2023
DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection

DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Object Detection Code for our Paper DAFNe: A One-Stage Anchor-Free Deep Model for Oriented Obje

Steven Lang 58 Dec 19, 2022
FG-transformer-TTS Fine-grained style control in transformer-based text-to-speech synthesis

LST-TTS Official implementation for the paper Fine-grained style control in transformer-based text-to-speech synthesis. Submitted to ICASSP 2022. Audi

Li-Wei Chen 64 Dec 30, 2022