Classifier Adaptation at Prediction Time

by Amelie Royer & Christoph H. Lampert
CVPR, 2015

This paper aims to provide more accurate image classifications by creating an adaptive classifier that can be added onto any pre-existing classifier by not assuming that the test data will be i.i.d. but instead related semantically.

The first part of the paper offers a classifier adaptation system which relies on estimating class proportions in the test dataset using a symmetric Dirichlet distribution as the prior. The “n” term of this distribution is calculated based off of the prediction scenario (online, bandit feedback, or unsupervised) and if the distribution will vary over time. For the time varying distributions, they use the same terms as in the non-time varying cases except that a sliding window is used to allow for variation.

The second half of the paper proposes a way to test adaptive systems that does not rely on i.i.d. test data. Specifically, the paper focuses on different methods to generate non-i.i.d. datasets. The proposed methods use random walks through WordNet computed by either a multidimensional scaling or kernelized sorting process and allow for random jumps between contexts. They also use noun sequences from books on Project Gutenberg to generate additional image sequences to test with. The paper doesn’t make clear the benefits of using the two word embeddings over using their calculated distances directly.  It also seems that fictional books and older texts that would be present in Project Gutenberg would create very artificial sequences that have different noun-class distributions than would be seen in the world today. It would be interesting to see the approaches in the paper applied to more realistic data sets such as first person video capturing common tasks and situations.

In the unsupervised prediction case, feeding the predicted outputs back in as a prior, as suggested in the paper, can lead the system into overconfidence for a few categories. Since the test instances are not assumed to be i.i.d., it is possible that instances from a particular class will come first in the test set. This would make the estimated class frequency high for that class, so the classifier will almost never predict other classes after adapting to these few instances, which would be problematic.

Nevertheless, the experiments conducted in the paper compare performance of these adaptive classifiers with normal, non-adaptive classifiers (trained CNN and SVM models) on the various types of image sequences generated. The results show that the adaptive classifiers worked better for all image sequences except for a sequence of random images. For that, the baseline classifiers work best.

Since the online prediction scenario has access to the labels for all previous test instances, it would be interesting to see a comparison of the methods proposed in this paper with continuing to run SGD on the test instances.

This paper shows promising results for the effects of changing the prior distributions of classes to better match a situation, however feature adaptation can also be a useful approach. It isn’t clear if people use such priors. For example, a foreign object could appear in a scene and a human would be able to identify it even if adapted to the current objects. Contextual priors, on the other hand, are used heavily, such as when disambiguating a low resolution patch based on what is around it. Using scene specific models is useful, but achieving this through modifying the class priors seems superficial.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s