当前位置：网站首页>MIT: label every pixel in the world with unsupervised! Humans: no more 800 hours for an hour of video

MIT: label every pixel in the world with unsupervised! Humans: no more 800 hours for an hour of video

2022-04-23 11:10:00 【Zhiyuan community】

Taking the advantage of ICLR 2022 On the occasion of the award ,MIT、 Cornell 、 Google and Microsoft 「 To show off 」 A new SOTA—— Label every pixel in the world , And there is no need for manual work ！

Address of thesis ：https://arxiv.org/abs/2203.08414

From the effect of the comparison picture , This method is sometimes even more detailed than manual work , Even the shadows are marked .

But unfortunately , Although it looks very cool , But there was no shortlist （ Including nominations ）.

Say back to CV field , Actually , The problem of labeling data has plagued the academic circles for a long time .

For humans , Whether it's avocado or mashed potatoes , Even 「 Alien Mothership 」, Just take a look at , You can recognize .

But for machines , It's not that simple .

Make a data set for training , You need to frame the specific content in the image , At present, this matter can only be carried out manually .

such as , A dog sitting on the grass , Then you need to circle the dog first , And note ——「 Dog 」, And then put a note on the back piece of land 「 The grass 」.

Based on this , The trained model can make 「 Dog 」 and 「 The grass 」 Differentiate .

and , This matter is very troublesome .

You don't do it , It's hard for the model to recognize objects 、 Human or other important image features .

Do it , And very troublesome .

For human taggers , Segmented images cost about... More than classification or target detection 100 Times the energy .

Just labels 1 An hour of data takes 800 Hours .

The data indicates the worker ： I'm going to graduate, too ？

In order that human beings no longer have to endure 「 mark 」 The torture of （ Of course, it is mainly to promote the progress of Technology ）, The group of scientists just mentioned proposed a new method based on Transformer Methods 「STEGO」, Thus, the task of image semantic segmentation can be completed without supervision .

The purpose of unsupervised semantic segmentation is to find and locate semantic categories in image corpus , Without any form of annotation .

To solve this problem ,STEGO The algorithm must generate significant and compact enough features for each pixel , To form different clusters .

Different from the previous end-to-end model ,STEGO A method of separating feature learning from clustering is proposed , Will look for similar images that appear in the entire dataset , then , It associates these similar objects , To achieve pixel level label prediction .

stay CocoStuff On dataset ,27 Category specific unsupervised semantic segmentation tasks （ Including the ground 、 sky 、 Architecture 、 lawn 、 Vehicle 、 people 、 Animal, etc. ）.

Baseline method comparison Cho wait forsomeone 2021 Put forward in PiCIE Method , The picture results show ,STEGO The semantic segmentation prediction results do not ignore the key objects at the same time , Retain local details .

版权声明
本文为[Zhiyuan community]所创，转载请带上原文链接，感谢
https://yzsam.com/2022/04/202204231101108934.html

当前位置：网站首页>MIT: label every pixel in the world with unsupervised! Humans: no more 800 hours for an hour of video

MIT: label every pixel in the world with unsupervised! Humans: no more 800 hours for an hour of video

边栏推荐

猜你喜欢

随机推荐