Computer performance is based on the labeled training data – the quality, diversity and the amount of training data. But the collection of the commensurate quality data is a tough nut and includes hardcoding image symmetries into neural network architectures for improved performance. Another way is to get experts to manually design data augmentation methods like flipping, rotation which are important elements to train quality vision models. As of late, there hasn’t been a focus on finding new ways using machine learning to automatically improve existing data. But the results of the AutoML efforts raise the question of the possibility of automating the procedure of data augmentation too.
In “AutoAugment: Learning Augmentation Policies from Data”, the reinforcement learning algorithm that increases both the amount and diversity of data in a given set of data is explored. Data augmentation functions to train a vision model about image invariances in the data set in such a way that the neural network invariant to these symmetries for better performance. The former high-tech deep learning models used manually designed data augmentation policies but AutoAugment uses reinforcement learning to find the image transformation policies from the existing data set, resulting in improved performance of computer vision models without having to depend on newly devised datasets.
So how is Training Data augmented?
Data augmentation involves a simple procedure. Images may contain many symmetries that do not change the image information, for example, the mirror reflection of an object or say, a dog is still that object or the dog, but all invariances are not obviously perceived by humans. An example would be the mixup method of data augmenting that places images one on top of the other during training, and thereby improves the neural network performance.
In this context, AutoAugment ensures that custom data augmentation policies for computer vision datasets are automatically designed such as selecting simple basic image transformation operations, like rotating an image, changing its color etc. AutoAugment predicts the combination of image transformations as wells as the per-image probability and the transformation magnitude so that the image is not transformed in the same way. AutoAugment enables the easy selection of an optimal policy from a huge search space of 2.9 x 1032 image of transformation possibilities.
For this, it learns of different possible transformations based on what dataset it is run on. For example, it focuses on geometric transforms like translation and shearing, for the images involving the street view of house numbers (SVHN) which include natural scene images of digits. These geometric transforms represent common distortion in the dataset. Additionally, AutoAugment can completely invert colors naturally occurring in the original SVHN dataset.
However, Deep Learning with AutoAugment does not use shearing n CIFAR-10 and ImageNet, because these datasets generally do not have images of sheared objects. Also, it does not invert colors completely so as to avoid unrealistic images due to the transformation. AutoAugment instead adjusts the color and hue distribution slightly, while maintaining the general color properties intact. This indicates that the real colors of objects in CIFAR-10 and ImageNet are important, whereas only the relative colors are important for SVHN.
The AutoAugment algorithm has hit an impressive 83.54% top1 accuracy on ImageNet data by augmentation, and an error of 1.48% only on CIFAR10, suggesting a 0.83% improvement over the data augmentation designed by default and manually by scientists. The error rate on SVHN improved from 1.30% to 1.02%. It is significant to note that AutoAugment policies are transferable, so the policy found for the ImageNet dataset could also be transferred to other vision datasets such as FGVC-Aircraft etc which improves neural network performance.
With this algorithm and its results in improved performance of many competitive computer vision datasets, the future of the utility of this technology spanning more computer vision tasks and in other domains such as language models or audio processing looks bright.