Image recognition for volcano webcams: detecting volcanic plumes using deep Convolutional Neural Networks.
GNS Science is the leading Earth, geoscience and isotope research and consultancy service provider in New Zealand. Since 1865, when earthquakes were first identified with seismic faults in New Zealand, GNS saw that radioactive isotopes could be used for geological dating and demonstrated scientific excellence with it. These studies and investigations are continued from atomic to planetary states. The GNS inherited 140-year-old heritage from four main bodies: New Zealand Geological Survey, DSIR Geophysics Division, Institute of Nuclear Sciences and DSIR Geology and Geophysics. The Government of New Zealand established the Crown Research Institutes (CRI) in 1992. The company name was Institute of Geological and Nuclear Sciences Limited as registered. This was re-branded and named as GNS in 2006. The goal of the organization is to understand the processes and resources of the natural Earth system and transform them into environmental, social economic benefits. GNS consists of 390 staff located in three locations: — Wellington, Taupo and Dunedin. The structure and the domains are as shown.
This project comes under Science domain and Data Science and Geohazards Monitoring as the sub-domain. The project is based in Taupo, New Zealand as shown in figure 1.
1.2.1 Project overview
The project is the blend of three major technologies: GIS (Graphical Information system), Image processing and machine learning. It is about Tongariro volcano, in Tongariro National Park which is still in an active state. GNS operates 12 web cameras that capture volcano images every 10 minutes. The images are captured by camera during both day and night. Our focus area includes the images that are captured by camera TOTM which is one of the two cameras monitoring Tongariro volcano. Broadly, there are two categories of images: images having clear plume and images that are obstructed. For the clear images, some images have plume clearly visible while some images are there where there is no plume visible, or just a small plume. The obstructed images are those which have cloud in the surrounding area which may hinder seeing the plume. Cloud in the front of the image obscures the plume while plume in the background makes it difficult to analyze plume.
Analysis of these images is still done manually which is time-consuming and error prone.
1.2.2 Project objectives
This project aims to detect features of interest in the webcams automatically which are currently done manually. These features are a key source of information when a volcano is becoming active.
Machine learning and computer vision was used to recognise such features like fumaroles or distinguish between clear and obstructed views. The work included building the training dataset to then implement a prototype image recognition software.
There is no constrain as such in reference from GNS.
The project has open data, and it can be referred from: — ftp.geonet.org.nz/volcano_cams which contains images of volcano.
The data is gathered using FTP. It is an open-source data.
Navigating through the FTP site through the python code, it was observed that there were three columns on the page: — ‘Name’, ‘Size’ and ‘Date Modified’. Name’ refers to the year in which the image was recorded. ‘Size’ column is empty. ‘Date Modified’ refers to the last time the file is altered.
Further, clicking on the name entries, it navigates to another page. It contains three columns: — ‘Name’, ‘Size’ and ‘Date Modified’. Here, ‘Name’ refers to the months of the particular year in which the image is clicked. ‘Size’ column is empty. ‘Date Modified’ refers to the last time the file is altered.
It has total of 12 entries, corresponding each month.
Furthermore, clicking on the name entry on the page, it navigates to a new page containing columns: — ‘Name’, ‘Size’ and ‘Date Modified’. Here, ‘Name’ refers to the date in that month where it is clicked on which the image is captured. ‘Size’ column is empty. ‘Date Modified’ refers to the last time the file is altered.
It has maximum of 31 entries. For some months, it is less than 31, depending upon the total days in the month and camera condition to be online or offline for that day.
Further, clicking on each date on the page, it opens to a new web page which consists of three columns: — Name’, ‘Size’ and ‘Date Modified’. Here, ‘Name’ refers to the camera name from which the image is taken. ‘Size’ column is empty. ‘Date Modified’ refers to the recent time the file is altered.
It has maximum of 12 entries.
In some cases, there are less than 12 entries, in this case camera was offline.
The volcano that we are monitoring is Tongariro. Two cameras are monitoring the volcano: — TOTM and TOKR.
Out of both, our focus area for the project is on camera TOTM.
Further, clicking on camera TOTM, it opens another page. It has three columns: — ‘Name’, ‘Size’ and ‘Date Modified’. Here, ‘Name’ refers to the name of the image.
‘Size’ column represents the size of the image. It is in Kilobytes unit. ‘Date Modified’ refers to the last time and date the file is modified.
The images are in the .jpg format.
In the ‘Name’ of the files containing images, the entries are in the fixed format. The name of the files that contains the images starts with the year in which that image is clicked followed by the date followed by the time followed by ‘00’ followed by ‘TOTM’ in the format of images.
First four digits of the file name contains value of “years”, next two digits contains value of “month”, next two digits contains value of “date”, next four digits contains value of “time”.
For the last four digits representing value of “time”, first two digits represents value of “hours”, and last two digits represents value of “minutes”, as shown in Table 2.
If we click on each entry in the name column, it opens to the images of the volcano. Our data consists of all these images for all the dates and years. We have gathered images names in the form of data frame using python code.
Number of images that were gathered to build the dataset is 42,087.
2.1 DATA WRANGLING
2.1.1 Separating day-time images and night-time images.
Day-time images are filtered by the timestamp from the image file names, selecting only those images that were taken between 9 am to 5pm.
The images at night were discarded leaving total of 23,897 Images.
2.1.2 Cropping the images
Cropping the size of the images was considered for building our model. This way the model can look more specifically at the plume area and there could be less chances for it to get confused in the area that is completely out of the context.
The model will therefore not lose any important information, it will just discard the unnecessary detail thus improving the performance.
As shown in figure 3, Original size of images was (1400,2000).
Reducing the image size by cropping it from the top (from y=0 to y=400) and the bottom part (y<=1200), it will result in no loss of information. It will only loose clouds, sky and some ground area. This would reduce the amount of processing needed and focus on the area where the plume comes from. The resulting data frame has the files containing only the daytime images in .jpg format.
Size of images are cropping is (800,2000) as shown in the result section.
2.1.3 Reducing pixels of the images
The size of the images obtained after cropping was (800,2000). Reducing the pixels of the image shortens training time as there is less data.
Size of the image after reducing the pixels is (224,224). The reason for considering this size is that the model on which transfer learning was performed (VGG-16), only accepts the images in this size.
The image size after reducing the pixels in shown in the result section.
2.1.4 Data Augmentation
Data Augmentation was performed on the images to increase the performance of the model. This is because model can receive more data in varied form for training.
Later, it was discarded as all the images were only taken from a single angle. The purpose of the project, i.e prediction of plume in the new images, contains the input of the images in the same order. This is due to the alignment of camera TOTM which is the focussed area of the project. In the future, the images that are to be predict by the same camera will have same alignment. Hence, ignoring data augmentation is valid in this case.
Instead of increasing the dataset by data augmentation, labelling a reasonably good number of images was considered that can help the model to train well and give the good results.
2.2 DATA STRATEGIES
2.2.1 Dealing with Missing values.
There is essentially no missing data. However, in some of the cases there is missing image for some dates in FTP site. as shown in figure 4.
Number of images in the dataset is 23,897.
The labelled data contained 19,117 images and the unlabelled data contained 4,779 images.
2.2.2 Labelling the images.
If we observe the images, there are certainly some images where we can see a volcano plumetry. Plume can be referred as an explosive eruption as the result of explosive materials, gas and particles of the volcano.
Studying the plume, we can analyse the volcano eruption and future chance of it to explode. Research paper “The Initial Development of Transient Volcanic Plumes as a Function of Source Conditions” by “Pierre‐Yves Tournigand” shows that the images of volcano has high co-relation with its explosion rate.
The category of images where we can observe the plume can be stated as ‘clear’. The category where we cannot see any plume since there isn’t any or is hidden by surroundings can be stated as ‘obstructed’.
Talking about the obstructed class, some images are there where we cannot see plume at all. However, there are many images where we can see some clouds that appears like plume but are clouds. This may hinder the actual chances of correct labelling.
Hence, two major classes can be:
Clear: — Images where plume is clearly visible.
Obstructive: — Images where plume is not visible.
The obstructive class contains the images having no plume. This may be because there is no plume present as shown in figure 5.
The other reason for no plume visibility is because of the clouds, water, gas droplets present in the images as shown in figure 6. There may be clouds either in the background of image or in the front that is preventing our chances to clearly segregate the images into first two categories as displayed in table 3.
2.3.1 Process followed.
The overall aim of the project is to analyse the clear images that has plume as shown in figure 7.
Milestone 1: Identify features of interest and build training dataset.
Milestone 2: Identify image pre-processing pipeline to enhance features of interest.
Milestone 3: Implement, train, and evaluate machine learning algorithm to automatically recognize features of interest.
2.3.1 Convolution Neural network.
To identify the features of the image, we are using convolutional neural networks which uses multiple layers for identification of features. One of the layers is convolutional layer. Computer recognises the images in the form of following concepts: -
1. Pixels: — It is the single point in the image. If all these points are combined, it results in the whole image. CNN notices the patterns of these pixels and hence classifies the images accordingly. These patterns are called feature detector or Kernel or Filter.
2. Feature detector: — It is the matrix of weights to detect the image.
3. Convolutional operation: — The images are depicted in the form of numbers because computer sees an image as the array of numbers. Numbers are between 0 and 255 corresponding to the brightness of each pixel.
The feature detector is used on the input images. The element wise multiplication is performed on input image and feature detector which sums out and gives the output image. We use as it reduces size of the image significantly. Processing actual pixels is very difficult. We lose some of the information by this method because of less value in the resulting matrix but we still get most of the useful information that helps us to classify the image. It neglects all the unnecessary components in the image.
We get multiple feature maps as one model applies many feature detectors and thus decide which features are most important. For three dimensional images, we have RGB (red, green, blue) colours for pixels resulting in proper feature extraction.
2.3.2 Transfer learning
The concept of transfer learning over deep learning is used. VGG-16 is applied by adding dense layer on the top as shown in figure 8. By replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3–3 kernel-sized filters one after another, this enhances AlexNet. VGG16 was trained for long time and used NVIDIA Titan Black GPUs for weeks.
A sequential model was developed with a pre-trained VGG16 model using Python Libraries Keras, TensorFlow, and all layers are frozen to allow transfer learning as shown in figure 9. It also helps to stop changing the weights, resulting in lower training time.
2.3.3 Adding dense layer to VGG-16.
There is an addition of a final Dense layer with SoftMax activation to form a fully connected layer whose output is likely to be in the 0 and 1 range. It indicates the classes in the case of this project.
2.3.4 Using optimizers for better performance of model.
220.127.116.11 Optimisers used in the project.
Six different optimizers Stochastic Gradient Descent (SGD), Adadelta, Adam, RMSprop, Adagrad and Nadam were chosen with seven different learning rates (0.0005, 0.0001, 0.005, 0.001, 0.05, 0.01, 0.1) to tune the model. The batch size is set to 32 and Epoch is set to 100 for training.
Stochastic Gradient Descent: Stochastic gradient descent is a variation to the gradient descent algorithm in which the updates are made to the coefficient after each training instance, instead of updating at the end of batch of instances. The learning will be much faster, and the randomness of training data will be ensured.
Adagrad: It is one of the gradient descent algorithms which adapts learning rate and is highly suitable for sparse data since it considers low learning rates for frequently occurring parameters and high learning rates for infrequent parameters. It takes a subset of training data and update the weight by computing the gradient and squared error.
RMS Prop: RMS Prop computes the learning rate with an exponential average of squared gradients. Similar to Adagrad it takes a subset of training data and compute the gradient and squared error along with decay rate and then update the weights.
Adam: It is one of the most efficient algorithms which computes learning rate for each parameter. It takes into account both the exponentially decaying average of gradients and squared gradients which is the first moment and second moment. Both gradient and squared gradient are computed and biased towards zero and weights are updated by bias corrected gradients and squared gradients. Combination of momentum and RMS prop.
Adadelta: It is a stronger extension of Adagrad. It adapts learning rate based on moving window of gradient updates instead of considering all the past gradients. It enables continuous learning even after many updates.
Nadam: It incorporates Nesterov accelerated gradient with Adam. It is beneficial for noisy gradients and for gradients with high curvatures. With Nadam, the learning process is accelerated by summing up the exponential decay of moving averages for the previous and current gradient.
The loss function used is “Binary” since the model is a two-class identifier. ‘Accuracy’ is the performance metric for evaluating the model. A Random Normal initializer is used to initialise weights. Bias is initialised to zero.
The seed is set to zero for all the models to get same sequence of numbers across trials. The Test Data and validation data is set to validate the models. Loss and accuracy against epochs are documented and plotted using ‘matplotlib’ library.
18.104.22.168 Performance metrics used in the project.
The measure of performance is based upon confusion matrix and accuracy score depending upon the classes that were classified correctly and the classes which were not classified correctly.
The true positive and true negative tells the ability of model to classify the class. False positive and False negative is the ability to misclassify the classes as shown in figure 10.
After discarding night images and data pre-processing, the input image transforms to the image of size (224,224) for data modelling as shown in figure 11, 12 and 13.
Data Augmentation was discarded as the images that are to be predicted are at the same angle and orientation. More images were labelled as mentioned in table 4 and table 5.
After the pre- processing of modelling of data, six optimisers were applied to give the better validation and testing score. Table 6 and Table 7 depicts the loss and accuracy scores respectively of training, validation, and testing datasets for different optimizers.
It is clear from Table 6 and Table 7 that the highest performance of the sequential model is with optimiser SGD with accuracy of 0.812 and loss of 4.06. The loss function and accuracy graph for training and validation set for sequential model with SGD optimiser is shown in Figure 14 and Figure 15 respectively.
It is obvious from the graph that loss tends to decrease over a period of time with some spikes in between.
It is obvious from the graph that accuracy tends to increase over a period of time with some spikes in between. After getting this performance from SGD, the model was applied to unlabelled data to predict the classes of the images (as shown in the flowchart in Figure 16). Input file is the image that is to be predicted, in the ‘.jpg’ format, the output is the name of the class in which the input image belongs to. For example, if the input image contains a plume, the output will be ‘Clear’ as the name of the class having images with plumes is ‘Clear’.
The model was applied to predict 2788 unlabelled images. The confusion matrix (as shown in Table 8) was obtained for all the images.
The average precision of the model is 0.95 with 0.94 recall and 0.94 F1 score. This means that our model is a decent model and can be used for the prediction of plume. The error where there was plume and our model didn’t recognise is higher than the error where there was no plume as our model recognised it. This score is not a good measure as it might hinder the chances of taking precautions to prevent eruption of volcano when there were certainly chances of seeing a plume.
4.1 Future scope
Our model has performed well and can be used for the prediction of plume. However, since there are lot of images, so labelling more images can result in increase of performance further. Reducing Type II error by optimising can also be considered in the future
Some examples of images with plumes.
Some examples of images with no plumes
Example of image having clouds in front of the focussed area.
Example of image having clouds in the background of the focussed area.
Example of an image where focussed area is not visible because of water droplets.
Trainable and untrainable parameters of VGG-16
Note:- It is also important to include the cover page, acknowledgement, index and references in the report.