当前位置:网站首页>Video understanding

Video understanding

2022-04-23 19:30:00 Lao wa next door

Video understanding is an important task in computer vision , In recent years, with the application of deep learning, especially supervised learning, video understanding has made rapid development , For example, tasks such as video behavior classification and video clip summary have achieved remarkable results .

however , In real life, the application video clips of many scenes need more than one tag to provide enough information . for instance , The robot poured water into the cup , A simple “ Pour liquid ” The label is not enough to predict when the cup is full , The robot needs to track the amount of water in the cup frame by frame . Again for instance , In the field of motion analysis , Baseball coaches don't just want to see the pitch , But to accurately analyze the moment when the pitcher throws the baseball away from his hand , A single video tag is not enough to complete such a video retrieval task . This means that the video understanding model needs the ability to understand video frame by frame .

However, if the method of supervised learning is used, the learning cost will become very expensive , This requires fine-grained annotation of the actions of each frame in the video , Training different movements also requires new labels to provide supervision signals . But from robotics to motion analysis , There is a strong demand for fine-grained video understanding , that How to learn video to understand fine-grained information without requiring a large number of tags

Researchers from Google have proposed a method called Time cycle consistent learning (Temporal Cycle-Consistency Learning,TCC) Self-monitoring method . Fine grained time-domain video understanding is realized by learning the representation of similar processes of different samples , For frame by frame video retrieval 、 Action analysis 、 Video synchronization and multimodal migration provide a new solution .

Understanding the dynamic behavior in video is AI The key direction of future development .

Video behavior understanding includes video classification 、 Action recognition 、 Temporal behavior detection and video summary generation .

Recently, I sorted out the papers I read , Mainly video classification 、 Motion recognition and video data sets , The best level on the relevant data set is listed , Share in GitHub.

HMDB51 On dataset ,DOVF+MIFS The highest level of accuracy of the method is 75%, There is still much room for performance improvement on this dataset ;
UCF101 On dataset ,TLE The highest accuracy of the method is 95.6%;
ActivityNet On dataset ,UntrimmedNet (hard) The highest level obtained by the method is 91.3%;
Sports-1M On dataset ,LSTM+Pretrained on YT-8M Methods to obtain the highest level of [email protected] and [email protected], Respectively 74.2% and 92.4%,mAP by 67.6%;
YouTube-8M On dataset ,WILLOW The team approach achieves the highest level of 84.967%.
Awesome Video Understanding

2. Keyword extraction for the semantics of video content ;
1) Frame screenshot of video :

l Some cut every frame ;

l Some are cut again when the lens is switched , How to judge whether the video shot is converted ? Calculate the difference between the two pictures , Much worse , It means that the lens has changed , Need to take another screenshot .

2) Identify the semantic screenshot of ;

3) Convert the voice of the video into text ;

4) Semantic recognition of characters

5) Integrate the semantics obtained from the above screenshot with the semantics obtained from the text , That's the semantics of this video ;

版权声明
本文为[Lao wa next door]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/04/202204231923218641.html