当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- 【webrtc】Add x264 encoder for CEF/Chromium
- Openlayers 5.0 two centering methods
- 视频理解-Video Understanding
- MFCC: Mel频率倒谱系数计算感知频率和实际频率转换
- Solve the problem of invalid listview Click
- [webrtc] add x264 encoder for CEF / Chromium
- 精简CUDA教程——CUDA Driver API
- A brief explanation of golang's keyword "competence"
- ArcMap publishing slicing service
- 山大网安靶场实验平台项目-个人记录(五)
猜你喜欢
MySQL syntax collation (4)
[webrtc] add x264 encoder for CEF / Chromium
Zero base to build profit taking away CPS platform official account
MySQL lock
Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
An algorithm problem was encountered during the interview_ Find the mirrored word pairs in the dictionary
Unity创建超写实三维场景的一般步骤
@Mapperscan and @ mapper
Prefer composition to inheritance
2021-2022-2 ACM集训队每周程序设计竞赛(8)题解
随机推荐
Some speculation about the decline of adults' language learning ability
Openlayers 5.0 discrete aggregation points
White screen processing method of fulter startup page
[webrtc] add x264 encoder for CEF / Chromium
Executor、ExecutorService、Executors、ThreadPoolExecutor、Future、Runnable、Callable
【h264】libvlc 老版本的 hevc h264 解析,帧率设定
MySQL syntax collation (2)
Redis core technology and practice 1 - start with building a simple key value database simplekv
点云数据集常用处理
openlayers 5.0 两种居中方式
Speculation on the way to realize the smooth drag preview of video editing software
goroutine
Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
MySQL数据库 - 单表查询(一)
RuntimeError: Providing a bool or integral fill value without setting the optional `dtype` or `out`
SQL of contention for system time plus time in ocrale database
The flyer realizes page Jump through routing routes
[H264] hevc H264 parsing and frame rate setting of the old version of libvlc
Kubernetes入门到精通-在 Kubernetes 上安装 OpenELB
山大网安靶场实验平台项目—个人记录(四)