当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- Web Security
- Thoughts on the optimization of examination papers in the examination system
- Audio editing generation software
- kibana 报错 server is not ready yet 可能的原因
- 坐标转换WGS-84 转 GCJ-02 和 GCJ-02转WGS-84
- 【webrtc】Add x264 encoder for CEF/Chromium
- Reflection on the performance of some OpenGL operations in the past
- Customize the non slidable viewpage and how to use it
- 数据库查询 - 选课系统
- Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
猜你喜欢

【webrtc】Add x264 encoder for CEF/Chromium

Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。

C6748 软件仿真和硬件测试 ---附详细FFT硬件测量时间

【webrtc】Add x264 encoder for CEF/Chromium

MFCC: Mel频率倒谱系数计算感知频率和实际频率转换

ESP8266-入门第一篇
![[报告] Microsoft :Application of deep learning methods in speech enhancement](/img/c1/7bffbcecababf8dabf86bd34ab1809.png)
[报告] Microsoft :Application of deep learning methods in speech enhancement

JVM的类加载过程

Kubernetes入门到精通-裸机LoadBalence 80 443 端口暴露注意事项

Oracle configuration st_ geometry
随机推荐
MySQL lock
Main differences between go and PHP
点云数据集常用处理
The most detailed network counting experiment in history (2) -- rip experiment of layer 3 switch
A brief explanation of golang's keyword "competence"
Openlayers draw rectangle
OpenHarmony开源开发者成长计划,寻找改变世界的开源新生力!
Scrum Patterns之理解各种团队模式
A simple (redisson based) distributed synchronization tool class encapsulation
Matlab 2019 installation of deep learning toolbox model for googlenet network
Esp8266 - beginner level Chapter 1
Hot reload debugging
Coordinate conversion WGS-84 to gcj-02 and gcj-02 to WGS-84
【webrtc】Add x264 encoder for CEF/Chromium
坐标转换WGS-84 转 GCJ-02 和 GCJ-02转WGS-84
深度学习——特征工程小总结
ArcGIS JS API dojoconfig configuration
对普通bean进行Autowired字段注入
How to select the third-party package of golang
Go three ways to copy files