当前位置：网站首页>Improving fee shot part segmentation using course supervision

Improving fee shot part segmentation using course supervision

2022-04-22 20:20:00 【wuling129】

Abstract

In view of the huge overhead of training network detailed annotation in component segmentation, the bottleneck problem , This paper presents a method that can be used, such as image background Mask、 A framework for easily available coarse-grained label information such as key point location information , So as to optimize the component segmentation model . The first challenge facing the framework is that coarse-grained tags come from different tasks and have different tag types , It's hard to map directly to radical marks . So , We propose a joint training radical segmentation model and coarse-grained marker types , Deep learning about their dependencies , Thus, the existing coarse-grained information can be used . In order to evaluate the advantages and disadvantages of the method , We're at Caltech UCSD Birds and OID A benchmark was developed on the aircraft dataset . Experiments show that our method is better than multi task learning 、 Baseline for semi supervised learning and competitive approaches , These methods rely on manually designed loss functions using sparse supervision .

1、 Background introduction

Precise models for marking components have many applications . They can help with fine identification tasks , Such as estimating the shape and size of animals 、 Species identification , It also supports graphic applications such as image editing and animation . A significant bottleneck is the huge cost of collecting annotation information that can be used to supervise network training . However , In many cases, datasets get alternative labels , For example, object bounding box 、 Graphics background masks or keys are relatively easy , These alternative labels can be considered as a source of Supervision . But the details and structure of these labels are often different , for example , Bounding boxes and masks are coarser than assembly labels , The key points are too sparse . therefore , They can't easily “ translate ” Part labels , Used to directly supervise learning .

This paper proposes a method that can use the self-contained of the data set, such as image background Mask、 A framework model for training radical segmentation based on coarse-grained label information such as key point location information . The basic principle of the framework is shown in the figure 1 Shown .

Treat the part label as an implicit variable , In Bayesian setting, the unknown dependency between component segmentation model and label style is jointly learned （ See 3 part ）. The relationship between rough marking and part segmentation is modeled by deep neural network , In this way, the rough label can supervise and train the neural network . A technical challenge is , Bayesian reasoning needs to sample the high-dimensional latent distribution , This is usually difficult to solve . We solve this problem by making some conditional independence assumptions , And developed a amortization reasoning program for learning . Our method allows training using off the shelf image segmentation networks and standard back propagation machines .

The contribution of the summary paper includes ：1） A framework for learning component segmentation model using various rough supervised markers of existing data sets ;2） An effective way to infer amortization , Than the main coarse supervision method （ for example PointSup[4]） Slow about 3 times , And more accurate ;3） from CUB and OID Several marking examples on the aircraft dataset evaluate two benchmarks for part segmentation ;4） Systematic evaluation of various design options , Including the effect of initialization on Transfer Learning , And the relative benefits of various forms of rough labels .

2、 Related work

2.1 Weakly supervised image segmentation

Previous work used classification labels 、 Sparse position in bounding box or image （ Such as point or line ） Supervision of .Zhou wait forsomeone [33] Use image level categories to label supervision information , By exploring the peak of category response, the classification network can extract instance segmentation well mask, writing [1, 34] The previous image classification model is used to generate pseudo images ground truth label .Khoreva wait forsomeone [15] Use bounding box as weak supervision . They use classical methods within a given bounding box （ Such as GrabCut[22]） Generate pseudo ground truth, And use it to train the segmentation model .Hsu wait forsomeone [13] First use the bounding box tightness , Then train a Mask-RCNN[10], Use the horizontal and vertical patches in the tight bounding box as positive signals , The external patch acts as a negative signal .Box-Inst[25] Use projection loss , Force the horizontal and vertical lines in the bounding box to predict at least one foreground pixel , Loss of affinity , Forces pixels with similar colors to have the same label .Laradji wait forsomeone [17] A case segmentation method based on proposal is introduced , This method uses one point per instance as a supervisor .Cheng wait forsomeone [4] Multiple points randomly sampled from each instance and boundary boxes are used as supervision to train Mask-RCNN Model .ScribbleSup[18] Use a graphical model , Take information from scribbles( Doodle ) Propagate to unlabeled pixels , To learn network parameters . Another workflow [3,35] Train two models at the same time , And cross supervise from one model to another .Naha wait forsomeone [19] Use key point guidance to predict component segmentation labels of unknown categories , However, key point input is required during the evaluation .

All of these methods design an algorithm specific to one kind of Supervision , The dimension style is clearly mapped to the required component label . comparison , Our method deals with various label styles , And have the opportunity to use existing data sets to learn part segmentation labels .