Covariate shift detection

2017-07-17 11:40:29

Is there any standard approach for detecting the covariate shift between the training and test data ? This would be useful to validate the assumption that covariate shift exists in my database which contains a few hundred images.

There are methods like the Kullback-Leibler divergence model, the Wald-Wolfowitz test for detecting non-randomness and covariance shift.

A simple test for quick analysis of covariance test would be to build a machine learning model, where the model is repeatedly tested with inputting training data and the production data.

In case, the model can make out the difference between the training and production datasets then it can be a sign of covariance shift.

Adaptive learning with covariate shift-detection for motor imagery-based brain–computer interface

http://link.springer.com/article/10.1007/s00500-015-1937-5

EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments (http://www.scienced

  • There are methods like the Kullback-Leibler divergence model, the Wald-Wolfowitz test for detecting non-randomness and covariance shift.

    A simple test for quick analysis of covariance test would be to build a machine learning model, where the model is repeatedly tested with inputting training data and the production data.

    In case, the model can make out the difference between the training and production datasets then it can be a sign of covariance shift.

    2017-07-17 11:51:58
  • Adaptive learning with covariate shift-detection for motor imagery-based brain–computer interface

    http://link.springer.com/article/10.1007/s00500-015-1937-5

    EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments (http://www.sciencedirect.com/science/article/pii/S0031320314002878)

    2017-07-17 11:58:54
  • Here is a simple procedure you can use:

    learn a classifier to distinguish between train/test data (using regular X features)

    compute the phi correlation coefficient to estimate the quality of the classifier = the separability of the train/test data

    set a threshold (e.g. .2) above which you can claim there is a covariate shift (and start looking as corrections)

    2017-07-17 12:30:57
  • You don't give many clues about what properties of the images you might be considering, but it seems that what you might want to measure is the difference in the distributions of the training and tests sets. A useful place to start would be with the Kullback–Leibler divergence, which is a measure of the difference of two distributions.

    2017-07-17 12:46:02
  • The problem of covariate shift ultimately results in datasets with different underlying mathematical structure. Now, Manifold Learning estimates a low dimensional representation of high-dimensional data thereby revealing the underlying structure. Often Manifold Learning techniques are not projections -- therefore, different and more powerful, than standard PCA.

    I've used Manifold Learning techniques (for eg: IsoMap, MDS, etc) to visualize (and, if possible, quantify) the "(dis)similarity" between train and test datasets.

    2017-07-17 13:03:53