Despite the presence of convolutions, convolutional neural networks can be highly unstable to shifts in their input due to downsampling operations (typically in the form of strided convolutions and pooling layers).  This results in the loss of shift-invariance in CNN classifiers and shift-equivariance in networks like U-Nets used for image-to-image regression tasks.

We address this challenge by proposing novel stride layers: adaptive polyphase downsampling (APS-D) and upsampling (APS-U). APS-D allows CNN classifiers to achieve 100% consistency in classification performance under shifts, without any loss in accuracy, whereas APS-U enables perfect shift equivariance in symmetric encoder-decoder based CNNs on image reconstruction tasks. With our proposed method, CNNs exhibit perfect shift invariance even before training, thus making them the first approach that enables truly shift invariant convolutional neural networks.

Project video:

Project members: Anadi Chaman and Ivan Dokmanić.

Adaptive Polyphase Downampling (APS-D)

One of the primary reasons CNNs lose robustness to shifts is the use of downsampling (popularly known as stride) layers in the networks.


Downsampling an image and its shift can result in significantly different outputs. The sensitivity of this operation to shifts results in loss in robustness of CNNs to translations. Note that strided pooling layers like averaged or max-pooling can be expressed in the form of downsampling which explains their instability to shifts as well.

While existing methods like data augmentation and anti-aliasing help improve robustness to shifts, they both have limitations and can not result in complete shift invariance.

In this work, we present Adaptive Polyphase Sampling (APS), an easy-to-implement non-linear downsampling scheme that completely gets rid of this problem. The resulting CNNs yield 100% consistency in classification performance under shifts without any loss in accuracy. In fact, unlike prior works, the networks exhibit perfect consistency even before training, making it the first approach that makes CNNs truly shift invariant.

Unlike classical subsampling methods that always sample an image along a fixed grid, APS chooses the sampling grid adaptively in a way to ensure consistency with shifts.






Truly Shift-Invariant Convolutional Neural Networks
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021
    author    = {Chaman, Anadi and Dokmani{\'c}, Ivan},
    title     = {Truly Shift-Invariant Convolutional Neural Networks},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    url       = {},
    pages     = {3773-3783}
Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling
55th Asilomar Conference on Signals, Systems, and Computers, 2021
  title={Truly shift-equivariant convolutional neural networks with adaptive polyphase upsampling},
  author={Chaman, Anadi and Dokmani{\'c}, Ivan},
  booktitle={55th Asilomar Conference on Signals, Systems, and Computers},
  url = {}