Anomaly detection on the surface of Mars has a few unique challenges. For example, finding publically available datasets like landing images, using deep convolutional networks and exploring the large variety of surface anomalies. Still, a team of 30+ engineers took on the challenge to build an anomaly detection model using NASA data from the surface of Mars.
I got this wonderful opportunity to work on the Omdena AI challenge “Anomaly detection in Martian Surface”. The objective of this project is to detect the Anomalies on the martian (MARS) surface caused by non-terrestrial artifacts like derbies of MARS lander missions, rovers, etc.
In the past, I have attempted a few competitions in ‘Kaggle’ and ‘AnalyticsVidya’, even though it helped me improve my understanding on the AI domain, I felt the learning exposure is limited due to the context of competition. When I came across ‘Omdena’ their tag line of ‘learn by collaboration’ got me interested at once. I believe the best learning happens when we work together as a team with a common goal. What I learned in the past couple of months in this AI challenge is much more than what I learned from all the competitions combined.
Recently the search for so-called “Techno-Signatures” — measurable properties or effects that provide scientific evidence of past or present extraterrestrial technology, has gained new interests. NASA hosted a “Techno-Signature” Workshop at the Lunar and Planetary Institute in Houston, Texas, in September 2018 to learn more about the current field and state of the art of searches for “Techno-Signatures”, and what role NASA might play in these searches in the future. One area in this field of research is the search for non-terrestrial artifacts in the Solar System. This AI challenge is aimed at developing an ‘AI Toolbox’ for Planetary Scientists to help in identifying non-terrestrial artifacts.
This AI challenge had multiple challenges starting from the data set.
Challenges in the data set
Detecting anomalies on the martian surfaces had few unique challenges. Apart from data collection challenges and no predefined data set, the initial challenge is that the martian surface by itself is very diverse. The different part of the martian surface looks entirely different from the rest. The same surface in different seasons looks very different. Few samples of diverse martian surface:
So, the algorithm should be generic enough to detect anomalies in all these diverse surfaces. The second challenge is about the data imbalance, the Martian surface is about 145 million square kilometers and on this surface, we have a few 100s of anomalies each measuring not more than a few 10s of feet. One could consider using data augmentation to overcome the imbalance but still, the anomalies that get created in different terrain would be different and we don’t have labeled data across different terrains.
The next challenge is that the data population is already polluted with anomalies. Usual anomaly detection algorithms work by training AI models on normal/usual population samples and the model is used to predict how much the target data deviates from the normal/usual population sample. One can not apply this approach here.
Creating a data set
The data set I used for this exercise is the MSL Hardware images 12 Days after Landing, https://www.uahirise.org/ESP_028401_1755. In this image folks from “The University of Arizona” have annotated the surface images where the hardware created anomalies on the MARStian surface. This seems to be a perfect data set for initial evaluation.
The complete HiRISE image itself is about 550 MB in size and resolution of 25516 x 45788 pixels. On the left is the compressed browse image of the full HiRISE image. The anomalies are hardly visible in this image. Even in the full resolution image, it is very difficult to spot the anomalies unless one knows exactly where to look.
To create a data set out of this image I have split this image of 25516 x 45788 pixels into small image chunks of 256 x 256 pixels with 256 pixels stride. As you can see this image has black borders around and the black borders are not good for the AI models. To reduce effort on pre-processing this image I have eliminated image chunks on the borders and used the images chunks from the center of the images. This results in a data set with about 10K images. In these 10K images, there are 6 anomalies spread across 50 images.
Following are the sample images of 6 anomalies present in the data set:
The intuition behind the algorithm
The primary motivation is to take advantage of the challenges in the data set and apply the “Anomaly Detection using Deep Learning based Image Completion” approach as mentioned in the publication: https://arxiv.org/abs/1811.06861
The idea is to use a deep Convolutional neural network for patch-wise completion of surface images with the aim to detect anomalies on the surface. The approach used in the paper is to train the model exclusively on normal data, mask/cut-out the center 32 x 32 pixels, and the model is trained to reconstruct the masked out region. To detect anomalies the query image’s center 32 x 32 pixels are masked and the model is used to reconstruct fault-free clones of the masked region. By subtracting the corresponding query region with the generated region, a pixel-wise anomaly score is obtained, which is then used to detect anomalies.
In our data set since the anomalies are expected to be very less compared to the normal images, we will use the entire data set as is to train the model. The trained model is then used to compute the anomaly score on the same data set. Since the model will generalize on normal images it is expected to give out high anomaly scores for anomalies.
Let’s check if this approach can find these 6 needles in a haystack of 10K images.
Feed-forward generative DCNN
Image completion tasks typically have the aim to complete missing regions of an image in the most natural-looking way. Besides being semantically meaningful, the in-paint must also look as authentic as possible. For this reason, feed-forward in-painting DCNNs are often trained jointly with an adversarial network. The adversarial network has the objective to discriminate between fake and real images. In contrast, the generative model must increase the error rate of the adversarial network by generating realistic images. Although this additional adversarial loss indeed causes in-paints to look more realistic, it has no positive effect on pixel-wise matching the missing part of the image. Training with the joint loss function even increases the pixel-wise reconstruction error, which is undesirable behavior for anomaly detection. For this reason, the paper uses the feed-forward generative DCNN is trained with reconstruction loss only.
The DCNN network consists of 17 layers as shown in the figure above. After the third layer, the resolution of the feature maps is halved by stridden convolution. In order to increase the receptive fields of the output neurons, a series of dilated convolutions is used (layers 7 – 10). Up-scaling back to the input size at layer 13 is performed by bi-linear re scaling followed by a convolution. Mirror padding is used for all convolutional layers. Further, Exponential Linear Unit (ELU) activation functions are used.
The network is trained with L1 reconstruction loss. The 32 x 32 center region, defined by the binary mask M, is weighted differently from the remaining region. With X being the image patch to be inspected, the network is trained with the following loss function:
The image chunks of 256 x 256 are resized to 128 x 128 and the image completion network is fed with image patches of size 128 x 128. The patches are corrupted by masking out the central area of size 32 x 32. This large ratio between known and unknown image content provides the network with more semantic information to complete the center region. After the reconstruction of the corrupted image by the network, the pixel-wise absolute difference between the reconstructed image and the query image is computed as a loss value by the above-mentioned loss function. The model is trained for 200 epochs.
For anomaly detection, only the 24 x 24 center region of this absolute difference image is used. Image patches in which defects appear close to the border of the cut out 32 x 32 center region, the neural network seems to generate local continuations of the bordering defects. By considering only the 24×24 center region, these unwanted continuations are mostly excluded.
After training the model for 200 epochs the mode is used to predict the anomaly score on the same data set again. The resulting anomaly scores distribution looks like this:
Listing images that are 3 standard deviations away from the sample mean gives us 99 image files. 99 out of 10K images are not bad. By cross-referencing these file names with the anomaly image chunks, of 99 images, only two anomalies categories, ‘Parachute’ and ‘Descent-Stage-Crash-Site’, were detected by this model. It missed out ‘Heat-Sheild’, ‘2-new-spots-spot1’, ‘2-new-spots-spot2’, ‘Curiosity-Rover’. Out of 99 images, only 7 images were of anomalies. Not a good result though!
On closely analyzing the anomaly images that were not detected, it is found that missed out anomaly images have the anomalies to the side of the image, i.e., not at the 32 x 32 center of the image. Since the DCNN model is trained to reconstruct the center 32 x 32 pixels, splitting the image chunks with the stride of 16 will ensure that anomaly will occupy the center 32 x 32 part of some image chunks.
So, recreated the initial data set with image chunks chopped with stride 16. Instead of retraining the model again on this new data set, the previously trained model is used to predict the anomaly score on this stride 16 data set. The resulting anomaly score distribution looks like this:
We can clearly see a spike in the number of images on the far right of the distributions. Listing out the images that are 3 standard deviations away from the sample mean gives us 705 image files. By cross-referencing these file names with the anomaly image chunks, it is found that all 705 files belong to four types of anomalies ‘Heat-Sheild’, ‘Parachute’, ‘Descent-Stage-Crash-Site’, ‘Curiosity-Rover’. And zero false positives. Voila, 4 out of six needles are found!
This approach missed out on ‘2-new-spots-spot2’, ‘2-new-spots-spot1’ anomalies still. If we observe these anomalies, these anomalies are tiny, and as the images are re-scaled from 256 x 256 to 128 x 128, the anomaly scores generated by these images will be less.
The approach of “Anomaly Detection using Deep Learning based Image Completion” seems to be a viable option for detecting techno signatures on the Martian surface. The model performance can be further enhanced by:
- Train for more epochs
- Up-scaling model configuration
- Clustering HiRISE images and train on a group of similar images so that the model would generalize better and detect anomalies more decisively.