当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- 2021-2022-2 ACM集训队每周程序设计竞赛(8)题解
- Problems caused by flutter initialroute and home
- [report] Microsoft: application of deep learning methods in speech enhancement
- 【webrtc】Add x264 encoder for CEF/Chromium
- 视频理解-Video Understanding
- Grafana 分享带可变参数的链接
- UML类图几种关系的总结
- kibana 报错 server is not ready yet 可能的原因
- 点云数据集常用处理
- 指针数组与数组指针的区分
猜你喜欢
MySQL lock
@MapperScan与@Mapper
MySQL syntax collation (4)
Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
MySQL syntax collation (5) -- functions, stored procedures and triggers
深度学习——特征工程小总结
White screen processing method of fulter startup page
arcMap 发布切片服务
深度分析数据恢复原理——那些数据可以恢复那些不可以数据恢复软件
2021-2022-2 ACM training team weekly Programming Competition (8) problem solution
随机推荐
Machine learning catalog
goroutine
Solve the problem of invalid listview Click
Oracle configuration st_ geometry
The flyer realizes page Jump through routing routes
Kibana reports an error server is not ready yet. Possible causes
MySQL数据库 - 单表查询(一)
The difference between underline and dot of golang import package
Openlayers 5.0 two centering methods
MFCC: Mel频率倒谱系数计算感知频率和实际频率转换
Audio editing generation software
音频编辑生成软件
Garbage collector and memory allocation strategy
@MapperScan与@Mapper
MFC obtains local IP (used more in network communication)
Hot reload debugging
HTTP cache - HTTP authoritative guide Chapter VII
arcMap 发布切片服务
IIS数据转换问题16bit转24bit
仓库管理数据库系统设计