当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- Convert string to JSON
- 精简CUDA教程——CUDA Driver API
- Steps to build a deep learning environment GPU
- Codeworks round 783 (Div. 2) d problem solution
- Go three ways to copy files
- [report] Microsoft: application of deep learning methods in speech enhancement
- Common processing of point cloud dataset
- Thoughts on the optimization of examination papers in the examination system
- 视频理解-Video Understanding
- Reflection on the performance of some OpenGL operations in the past
猜你喜欢

【webrtc】Add x264 encoder for CEF/Chromium

指针数组与数组指针的区分
![[report] Microsoft: application of deep learning methods in speech enhancement](/img/29/2d2addd826359fdb0920e06ebedd29.png)
[report] Microsoft: application of deep learning methods in speech enhancement

Zero cost, zero foundation, build profitable film and television applet

山大网安靶场实验平台项目—个人记录(四)

@Mapperscan and @ mapper

【webrtc】Add x264 encoder for CEF/Chromium

如何在BNB链上创建BEP-20通证

Oracle configuration st_ geometry

An algorithm problem was encountered during the interview_ Find the mirrored word pairs in the dictionary
随机推荐
Pdf reference learning notes
Thoughts on the optimization of examination papers in the examination system
Openlayers 5.0 loading ArcGIS Server slice service
对普通bean进行Autowired字段注入
Mfcc: Mel frequency cepstrum coefficient calculation of perceived frequency and actual frequency conversion
LPC1768 关于延时Delay时间与不同等级的优化对比
Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
Openlayers draw rectangle
MySQL practical skills
Executor、ExecutorService、Executors、ThreadPoolExecutor、Future、Runnable、Callable
RuntimeError: Providing a bool or integral fill value without setting the optional `dtype` or `out`
Inject Autowired fields into ordinary beans
数据分析学习目录
Unity创建超写实三维场景的一般步骤
Solve the problem of invalid listview Click
openlayers 5.0 两种居中方式
MySQL syntax collation (4)
MySQL数据库 - 单表查询(三)
Zero cost, zero foundation, build profitable film and television applet
Kubernetes入门到精通-裸机LoadBalence 80 443 端口暴露注意事项