当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- UML类图几种关系的总结
- IIS数据转换问题16bit转24bit
- Kubernetes入门到精通-KtConnect(全称Kubernetes Toolkit Connect)是一款基于Kubernetes环境用于提高本地测试联调效率的小工具。
- [报告] Microsoft :Application of deep learning methods in speech enhancement
- 数据库查询 - 选课系统
- 【webrtc】Add x264 encoder for CEF/Chromium
- @Analysis of conditional on Web Application
- A simple (redisson based) distributed synchronization tool class encapsulation
- Strange passion
- 仓库管理数据库系统设计
猜你喜欢
2021-2022-2 ACM集训队每周程序设计竞赛(8)题解
Command - sudo
Application of DCT transform
Reflection on the performance of some OpenGL operations in the past
Is meituan, a profit-making company with zero foundation, hungry? Coupon CPS applet (with source code)
MySQL syntax collation (5) -- functions, stored procedures and triggers
Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
Grafana 分享带可变参数的链接
The most detailed network counting experiment in history (2) -- rip experiment of layer 3 switch
MySQL syntax collation
随机推荐
如何在BNB链上创建BEP-20通证
Speex维纳滤波与超几何分布的改写
Solve the problem of invalid listview Click
JS controls the file type and size when uploading files
Matlab 2019 installation of deep learning toolbox model for googlenet network
White screen processing method of fulter startup page
Why is the hexadecimal printf output of C language sometimes with 0xff and sometimes not
DevOps集成-Jenkins 服务的环境变量和构建工具 Tools
HTTP cache - HTTP authoritative guide Chapter VII
Regular expressions for judging positive integers
Openharmony open source developer growth plan, looking for new open source forces that change the world!
[report] Microsoft: application of deep learning methods in speech enhancement
How to use go code to compile Pb generated by proto file with protoc Compiler Go file
goroutine
Golang timer
Translation of audio signal processing and coding: Preface
Build intelligent garbage classification applet based on Zero
Audio signal processing and coding - 2.5.3 the discrete cosine transform
Easy mock local deployment (you need to experience three times in a crowded time. Li Zao will do the same as me. Love is like a festival mock)
MySQL syntax collation (4)