当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- 如何在BNB链上创建BEP-20通证
- Solve the problem of invalid listview Click
- 深度学习——特征工程小总结
- JS controls the file type and size when uploading files
- JVM的类加载过程
- Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
- [report] Microsoft: application of deep learning methods in speech enhancement
- kibana 报错 server is not ready yet 可能的原因
- Use of fluent custom fonts and pictures
- Reflection on the performance of some OpenGL operations in the past
猜你喜欢
Decompile and get the source code of any wechat applet - just read this (latest)
Zero base to build profit taking away CPS platform official account
Use of fluent custom fonts and pictures
【webrtc】Add x264 encoder for CEF/Chromium
优先使用组合而不使用继承
MySQL syntax collation (2)
RuntimeError: Providing a bool or integral fill value without setting the optional `dtype` or `out`
[webrtc] add x264 encoder for CEF / Chromium
2021-2022-2 ACM集训队每周程序设计竞赛(8)题解
ArcMap publishing slicing service
随机推荐
Efficient serial port cyclic buffer receiving processing idea and code 2
Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
IIS数据转换问题16bit转24bit
Encyclopedia of professional terms and abbreviations in communication engineering
Command - sudo
Esp8266 - beginner level Chapter 1
MySQL数据库 - 单表查询(一)
Reflection on the performance of some OpenGL operations in the past
JS controls the file type and size when uploading files
MySQL syntax collation (4)
Golang timer
Data analysis learning directory
【h264】libvlc 老版本的 hevc h264 解析,帧率设定
JS to get the local IP address
Codeforces Round #783 (Div. 2) D题解
指针数组与数组指针的区分
Why is the hexadecimal printf output of C language sometimes with 0xff and sometimes not
Pit encountered using camera x_ When onpause, the camera is not released, resulting in a black screen when it comes back
5 minutes to achieve wechat cloud applet payment function (including source code)
Openlayers 5.0 thermal diagram