当前位置:网站首页>Video understanding
Video understanding
2022-04-23 19:30:00 【Lao wa next door】
Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .
however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .
However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags ?
Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .
Understanding the dynamic behavior in video is AI The key direction of future development .
Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .
Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.
HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding
2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :
l Some cut every frame ;
l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .
2) Identify the semantic screenshot of ;
3) Convert the voice of the video into text ;
4) Semantic recognition of characters
5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;
版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html
边栏推荐
- IIS data conversion problem: 16bit to 24bit
- Speculation on the way to realize the smooth drag preview of video editing software
- TI DSP的 FFT与IFFT库函数的使用测试
- The usage of slice and the difference between slice and array
- Pdf reference learning notes
- Openlayers draw rectangle
- C6748 软件仿真和硬件测试 ---附详细FFT硬件测量时间
- What is a message queue
- Machine learning catalog
- OpenHarmony开源开发者成长计划,寻找改变世界的开源新生力!
猜你喜欢

Intuitive understanding of the essence of two-dimensional rotation

MySQL syntax collation (3)

Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies

深度分析数据恢复原理——那些数据可以恢复那些不可以数据恢复软件

MySQL syntax collation (2)

【webrtc】Add x264 encoder for CEF/Chromium

Decompile and get the source code of any wechat applet - just read this (latest)

Oracle configuration st_ geometry

JVM的类加载过程

An idea of rendering pipeline based on FBO
随机推荐
MySQL数据库 - 单表查询(三)
Build intelligent garbage classification applet based on Zero
A simple (redisson based) distributed synchronization tool class encapsulation
[report] Microsoft: application of deep learning methods in speech enhancement
Strange passion
OpenHarmony开源开发者成长计划,寻找改变世界的开源新生力!
goroutine
The difference between underline and dot of golang import package
JS controls the file type and size when uploading files
Strange problems in FrameLayout view hierarchy
Efficient serial port cyclic buffer receiving processing idea and code 2
IIS data conversion problem: 16bit to 24bit
js获取本机ip地址
【webrtc】Add x264 encoder for CEF/Chromium
Problems caused by flutter initialroute and home
山大网安靶场实验平台项目-个人记录(五)
Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
Possible root causes include a too low setting for -Xss and illegal cyclic inheritance dependencies
How to use go code to compile Pb generated by proto file with protoc Compiler Go file
Esp8266 - beginner level Chapter 1