In this article, we take you into a friendly approach to denoising images and Denoising Autoencoders (DAEs), their architecture, their importance in deep learning models, how to use them with neural networks, and how they improve models’ results.
Why do we need Denoising?
“A picture is worth a thousand words”. Just like words, the clearer the pictures, the better they are understood. The correct understanding of the image message in the world of medicine (e.g., MRI images) or self-driving cars can be crucial for humans.
The approach that a system/computer applies to learn about a process (message) or represented components (features of the picture) is based on breaking down the input data and reconstructing the output data as similar as possible to the input. Through such an approach, the system learns about the message presented by the input data.
In this case, Autoencoder is an appropriate consideration specifically due to its application in Denoising which has a great potential in the feature extraction and data component understanding as to the first steps before diving deep into the Image analysis and processing.
A quick note on Denoising Autoencoders
Briefly, the Denoising Autoencoder approach is based on the addition of noise to the input image to corrupt the data and to mask some of the values, which is followed by image reconstruction.
During the image reconstruction, the DAEs learn the input features resulting in overall improved extraction of latent representations. It should be noted that Denoising Autoencoder has a lower risk of learning identity function compared to the autoencoder due to the idea of the corruption of input before its consideration for analysis that will be discussed in detail in the following sections.
A few specifics about Denoising AutoEncoders (DAEs)
Denoising is recommended for training the model and DAEs provide the model with two important aspects; first DAEs preserve the input information (input encode), second DAEs attempt to remove (undo) the noise added to the auto-encoder input.
It should be noted that Denoising Autoencoders have shown to be edge and larger stroke detectors from natural image patches and digit images, respectively. Finally, DAEs perform better compared to traditional filters for denoising since DAEs can be modified based on the input, unlike traditional filters which are not data specific.
Regarding traditional denoising approaches (non-DAEs), an example can be noted where images from one of the real-world challenge projects at Omdena were considered for our analysis. In this case, denoising contributed to the feature extraction hence improving the identification of the target.
Now, in such a case study we applied the special filters (such as Bilateral) due to its capability for efficient noise filtration, but the image blurring suggested that we needed to consider DAEs for an improved denoised image in the future.
So, what are autoencoders?
At a high level, an autoencoder contains an encoder and decoder. These two parts function automatically and give rise to the name “autoencoder”. Encoder transforms high-dimensional input into lower-dimension (latent state, where the input is more compressed), while a decoder does the reverse encoder job on the encoded outcome and reconstructs the original image. It should be noted that traditional autoencoders (vanilla autoencoders) cannot reconstruct images from a latent state.
In denoising, data is corrupted in some manner through the addition of random noise, and the model is trained to predict the original uncorrupted data. Another variation of this is about omitting parts of the input in contrast to adding noise to input so that model can learn to predict the original image. In this case, the idea is storing the output generated by the encoder as a feature vector, which can be used in a supervised model train-prediction approach.
Denoising autoencoders application is very versatile and can be focused on cleaning old stained scanned images or contribute to feature selection efforts in cancer biology. Regarding, old images encoder compression contributes to an output, which helps model reconstructing the actual image using robust latent representations by the decoder. Regarding cancer biology, the extracted encoder features contribute to the efforts toward the improvement of a cancer diagnosis.
Technical specifics of Denoising autoencoder
The idea of denoising is based on the intentional addition of noise to the input data before the presentation of data. The major technical specifics for this approach include several aspects as follows.
• The denoising autoencoders build corrupted copies of the input images by adding random noise.
- Next denoising autoencoders attempt to remove the noise from the noisy input and reconstruct the output that is like the original input. A comparison is made between the original image, and the model prediction using a loss function and the goal is to minimize that loss.
- The loss function in denoising autoencoder is
- Denoising helps the autoencoder learn the latent representation in data and makes a robust representation of useful data possible hence supporting the recovery of the clean original input.
- A final note is about the random corruption/noise addition process in denoising autoencoders considering denoising as a stochastic autoencoder in this case.
Now, let us get into our Neural Network!
In this article, the MNIST Digit Dataset (each image: 28 X 28 pixels) is considered for the DAE case study, since it is a standard dataset used for Deep learning and computer vision. The applied Neural Network for this case study is the Convolutional Neural Network (CNN). Before starting with the CNN, it should be noted that CNN is the preferred Neural Network for image dataset analysis due to its effectiveness at capturing spatial features.
The DAEs process starts with loading the dataset and normalizing the pixel values. Then, the random noise (using function “np.random.noise”) is added to the training and test datasets. Accordingly, the noisy image will be used as the input for the encoder part and the main images will be used as the output to train/test the model. A quick note on the noisy image preparation is that the formula “F(X) = Y” is basically considered for which X is the original noise-free image while Y is the noisy image.
The dataset will be split into training and validation sets. 80% of the images will be used for training and the remaining 20% kept for validation. We shall be taking advantage of the Keras API support for specifying the validation set through the use of “validation_data” arguments in the model.fit() function that will return an object including model performance summary for the selected loss relevant to each training epoch (100 epochs in this case)
#Code example- adding noise: x_train_noisy = np.clip(x_train_noisy, 0., 1.) x_test_noisy = np.clip(x_test_noisy, 0., 1.)
During the training, a loss function is defined that is like root mean squared error (RMSE) and in every iteration, the network computes the loss and attempts to minimize the loss (difference) between the denoised (reconstituted image) from the decoder and the original image (noise-free image). validation loss is calculated based on the comparison done between the network output (y^i,) and actual output (yi) and as the network moves forward on inputs, this loss could be based on a loss function e.g., J=1/N∑Ni=1ℒ(y^i,yi)) where ℒ is individual loss calculated with the consideration of the difference between the predicted output and target.
The convolutional neural network model is defined as consisting of two major points: The encoder performing feature extraction (via convolutional and pooling layers) and the decoder (classifier, upsampling) parts. Basically, with the encoder, the image is scanned using the filters and the depth of images is basically being increased so that a better feature extraction opportunity can be possible, while the decoder reconstitutes the same image.
#Define the convolutional model input_img = keras.Input(shape=(28, 28, 1)) x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(input_im) x = layers.MaxPooling2D((2, 2), padding='same')(x) encoded = layers.MaxPooling2D((2, 2), padding='same')(x) # At this point the representation is (7, 7, 32) x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(encoded) x = layers.UpSampling2D((2, 2))(x) x = layers.Conv2D(32, (3, 3), activation='relu', padding='same')(x) x = layers.UpSampling2D((2, 2))(x) decoded = layers.Conv2D(1, (3, 3), activation='sigmoid', padding='same')() autoencoder = keras.Model(input_img, decoded) autoencoder.compile(optimizer='adam', loss='binary_crossentropy') # Code example (model (autoencoder) validation) autoencoder.fit(x_train_noisy, x_train, epochs=100, batch_size=128, shuffle=True, validation_data=(x_test_noisy, x_test))
Overfitting happens when: training loss >> validation loss
DAEs prevent identity function and unlike traditional filtering approaches used for reducing noise do not produce overly smooth images, and compute fast. In general, the DAEs use an improved autoencoder approach for their process, which is mainly based on the introduction of the noise to the input and reconstruction of the output from the corrupted image.
Such a modification of the general autoencoder approach prevents DAEs from simply copying the input to output hence requiring DAEs to first reduce the noise from the input before extracting the meaningful data.
In our DAE approach, CNN was applied because of the effectiveness of the concept in denoising and spatial relations preservation within the image. Besides, the choice of CNN serves the purpose for dimension and computational complexity reduction when arbitrary-sized images should be used as input.
The next plan is to apply the tested DAE from this article for an audio denoising case study. We aim to compare the performance of this model on audio data with the results from the current study on the MNIST Digit Dataset.
Last but not least I would like to express my appreciation to my collaborator Sijuade Oguntayo for sharing his knowledge and expertise in supporting me through this AI journey from the work on the DAE model to the finalized article.
Fan, Linwei, et al. “Brief Review of Image Denoising Techniques.” Visual Computing for Industry, Biomedicine, and Art, vol. 2, no. 1, July 2019, p. 7.
Gulli, Antonio, et al. Deep Learning with TensorFlow 2 and Keras: Regression, ConvNets, GANs, RNNs, NLP, and More with TensorFlow 2 and the Keras API, 2nd Edition. Packt Publishing Ltd, 2019.
Kessler, Travis, et al. “Application of a Rectified Linear Unit (ReLU) Based Artificial Neural Network to Cetane Number Predictions.” Volume 1: Large Bore Engines; Fuels; Advanced Combustion, 2017, doi:10.1115/icef2017–3614.
Khalaf, Maysa I. A. Almulla, et al. “Deep Classifier Structures with Autoencoder for Higher-Level Feature Extraction.” Proceedings of the 10th International Joint Conference on Computational Intelligence, 2018, doi:10.5220/0006883000310038.
Song, Jung Hun, et al. “Image Restoration Using Convolutional Denoising Autoencoder in Images.” Journal of the Korean Data And Information Science Society, vol. 31, no. 1, 2020, pp. 25–40, doi:10.7465/jkdi.2020.31.1.25.