tl;dr
I'm not a good programmer but I think I do interesting things. One of those things is researching volcanic eruptivity. Magma pushing up creates earthquakes and I use sensor readings of those earthquakes to do my research.
the code I need extract the features I need is:
import <libraries>
#read in the raw data
input_df = pd.read_csv("../Data/Specimen_Event.csv", index_col="Seconds")
#The output from the raw data isn't too useful, so let's look at the constituent frequencies
fft_df = input_df.copy()
y_values = np.fft.fft(input_df.Amplitude.values)
no_of_datapoints = len(y_values)
time_interval = 0.01
yf_values = 2.0/no_of_datapoints * np.abs(y_values[:no_of_datapoints//2])
x_values = fftfreq(no_of_datapoints, d=time_interval)
xf_values = fftfreq(no_of_datapoints, d=time_interval)[:no_of_datapoints//2]
#The frequency data is useful, but we've lost all temporal information - let's just take time slices and transform those, and glue it back together like a histogram - aka a seismogram
window_size = 256
recording_rate = 100
frequencies, times, amplitudes = signal.spectrogram(input_df.Amplitude.values, fs = recording_rate, window='hanning', nperseg = window_size, noverlap= window_size - 100, detrend= False, scaling="spectrum")
decibels = 20 * np.log10(amplitudes)
f, ax = plt.subplots()
ax.pcolormesh(times, frequencies, decibels, cmap="viridis")
This shows me both events, and a hidden pattern at 38Hz. This can be used to conjecture that we have a magma chamber of around ~150 metres.
Where do I start?
I'll try to start this post like I try to start my work - by managing expectations and being honest about my capabilities. I'm not a 10x engineer (if you believe in that myth) or anything near it, nor am I a laser-focused senior dev with years of open-source contributions. I won't change the way you code or the way you approach your work.
What I am, instead is a fence-sitter with a habit of flip-flopping between industry and academia; enough experience to know when I'm writing honkingly bad code (even if I can't stop myself from doing it the first place), but enough time in research to get to apply my mediocre skills onto something a bit different - in particular, volcanoes!
So, how exactly does volcanology dip into programming? Well, over the years, there have been breakthroughs in how we observe volcanic unrest - and more than that, changes in how we measure it too, and that's the crucial part. to misquote Arnie: "If it quantifies, we can analyse it".
So, what are we measuring? It's hard to peer inside a volcano, and we can't drill into an active volcano to get sample data. But we do have some proxies we can measure instead that give us hints, and one of the best ones is something called seismicity - the earthquake activity from the volcano. We can't have an eruption without magma getting to the surface (or at least somewhat close), but there are no nice easy pre-drilled pipes for them to follow. Every inch of vent has to be split from the surrounding rock, sometimes for kilometres.
This injection of magma is quite an energetic and noisy affair and one we can pick up. These small fractures create very shallow, distinct, localised earthquakes that we can pick up with seismometers - tilt/accelerometers measuring movement in the ±x, ±y, and ±z axes.
Cut the crap, show me the code
So, what's the point of this post? Well, it's all very well knowing that these earthquakes are happening, but how do we understand them?
We start where we always do, by looking at the data, but first, let's get the libraries we'll need in.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from numpy.fft import * # a library for carrying out Fourier Transforms
from scipy import signal # further FFT functionality
for those of you who want to code along at home, I've included pointers to the specimen data and a notebook at the bottom
Now, we need to get our data in. It's available as a simple time-series csv, which makes ingestion very easy, and as we're only loading in one example we're not worried about scaling this up, so we have no reason not to use pandas
and make our lives easier.
Once we have our data loaded, we'll plot the raw data quickly.
input_df = pd.read_csv("../Data/Specimen_Event.csv", index_col="Seconds")
input_df.plot(lw = 0.3)
Which gives us
We seem to be lucky enough to have two events in the same 300-second window; you can see one smaller event, followed by a larger one. But it's difficult to tell more than that for sure. We need a better way to understand the sensor readings.
What we saw in the plot was amplitude against time. When we treat the same data as a sound file and listen to it, we wouldn't just hear a rumble, but we'd be able to pick out deeper and higher frequency parts to it too. Thankfully, there is a way to mathematically pick out frequencies - the Fourier Transformations.
I'll leave the detailed explanation for a later post, but suffice to say that we can turn an amplitude vs. time plot, where we see the amount of energy recorded once per 0.01 seconds, to an amplitude vs. frequency plot, where we see the amount of energy across the datafile per frequency. You usually hear people refer to these as being in either the temporal- or time-domain and the frequential- or frequency-domain.
We're using the SciPy/NumPy
library's fast Fourier transform functions to convert our event into the frequency domain.
fft_df = input_df.copy()
y_values = np.fft.fft(input_df.Amplitude.values)
no_of_datapoints = len(y_values)
time_interval = 0.01 #the seismometers for this recording sample 100 times a second
yf_values = 2.0/no_of_datapoints * np.abs(y_values[:no_of_datapoints//2])
x_values = fftfreq(no_of_datapoints, d=time_interval)
xf_values = fftfreq(no_of_datapoints, d=time_interval)[:no_of_datapoints//2]
fig, ax = plt.subplots()
ax.plot(xf_values, yf_values, lw=0.3)
ax.set_xlabel("Frequency [Hz]")
ax.set_ylabel("Amplitude")
ax.set_title("Fast Fourier Transform for Specimen event")
plt.show()
We'll go through the steps here in detail in a later post, but in essence, we are calculating the amplitude by an arbitrary periodicity, but we need to independently work out the actual frequencies we're seeing (our x-axis, the frequencies all dependent on the number of data points and the rate of recording) using the fftfreq
function. There's a little juggling of having to feed in a halved, even-length range. Again, more on this in a later post.
This leaves us with this plot:
We can now see the distribution of frequencies in the event. There is a lot of low frequency rumbling between the 1 - 30 Hz region. The human hearing range starts at around 20Hz, so this is relatively low. But there is a trend.
The only problem now is that by moving the data into the frequency domain, we've lost any information about how the event changes over time. The answer to this is to slice the event up into narrow, slightly overlapping windows, and running the same Fourier transformations over these windows individually. We can then stack these together as a form of a histogram with both frequency and time information, known as a spectrogram.
There is a pay-off though - the shorter a sample, the less accurate and sensitive a Fourier transform is, but longer samples leave us with a poorer time resolution. One other thing we need to keep in mind is that it's computationally more efficient to use 2^n window sizes.
We could do this manually with the same code we ran before, but we'll use a short-cut: a prebuilt SciPy library that will run the transforms and prep them for us to visualise.
window_size = 256
recording_rate = 100
frequencies, times, amplitudes = signal.spectrogram(input_df.Amplitude.values, fs = recording_rate, window='hanning', nperseg = window_size, noverlap= window_size - 100, detrend= False, scaling="spectrum")
decibels = 20 * np.log10(amplitudes)
f, ax = plt.subplots()
ax.pcolormesh(times, frequencies, decibels, cmap="viridis")
ax.set_ylabel("Frequency $kHz$")
ax.set_xlabel("Time $s$")
ax.set_title("Fast Fourier Transform for Specimen event")
plt.show()
So what?
Well, the effort of recapturing the time trends of the data in the frequency domain paid off. What we can see in the last graph, is time progressing as you move right, and the recording getting higher in frequency as you go up - the brighter the colour, the more energy. We can see both events quite clearly, but there's one pattern that's less obvious, but more exciting.
If you haven't spotted it, there is a band of information at around ~38Hz that gets quite active after 150 seconds. It's difficult to be confident of what we're seeing, but we have two possibilities. The more mundane answer is that there is some background noise - like storms in the ocean, that has resonated up into this band - a frequency-echo. A signal sitting so consistently at ~38Hz would support this mechanism.
However, ocean noise is usually no higher than ~0.14Hz. Finding it as high as 38Hz is unlikely. While there are many other sources of noise, there aren't many that are that consistent as to stay at precisely that frequency range.
The other answer is that after these more significant events, magma often resettles and moves into spaces in the new cracks, and the pressure in the chamber resettles. As the signal appears more after the events and it's quite continuous, it's more likely to be the sloshing of magma than to be a single brittle failure.
Let's go on a last flight of fancy. I am not claiming that this last part is anything but conjecture, so sceptics avert your eyes!
Oddly, the signal is so consistent in frequency, so it might be that it's acting like the wind across a beer bottle. When the speed of the wind is fast enough across the aperture it whistles and resonates. In our case, it might be the other way around - that the aperture is the right size for the pressure waves.
So perhaps we can derive something about the shape of the magma chamber? If the chamber is a particular length or breadth, it'll resonate at a specific frequency related to its wavelength.
We know that the frequency f, and we want the wavelength lambda. The last thing we need to know is the speed of sound through magma - and while it's a varying number, for a back of the envelope calculation let's say 5km per second.
Let's do the maths:
So to round it off, we can estimate that some dimension of the chamber or vent or pipe that was fracturing, was around 131 metres in dimension - all from an audio clip!
Resources
For those of you wanting to play along at home, you can find the data here and a notebook here
Top comments (1)
Hi Richard Strange! I really thanks for your post. This is really hopeful for me. I would like to ask you that how can we determine window size is 256 or is just a random number of any power of 2. Thank you!