Exploring the data

The audio data files are large in size and long in duration. 10-second samples at 192kHz are 3.5MB in size and and each ACD generates 1.26GB every 6 hours. It is difficult for humans to listen to these files. All of the sounds are scrambled below 7kHz to obscure human voice which makes the sound inherently noisy and disorienting. To spot trends and shifts over long periods of time requires an attentive form of listening that is also very difficult and time consuming.

We have developed an experimental application that allowed us to extract the spectral energy across each 10-second segment and save these aggregated snapshots of spectra alongside one another. With this application, is possible to rapidly ‘scrub’ through the spectral content but still use our ears to pick out areas of interest very quickly. When a spectral snapshot seems interesting or surprising, we can listen to the original sound file that made the spectra.

Another approach we have explored involves concatenating the files into 6-hour chunks and generating spectrograms of that entire period of time, as shown in the following figure [click image to expand]:

Spectrogram spanning six hours of data from 12.00 am to 6.00 am on the 13th June 2018 from ACD3. Towards the righthand side of the image, an increase in spectral activity is noticeable. This correlates to what we would expect around 4.00 am on the morning in question — dawn was 3:56am in Edinburgh that day.

Long-duration spectrograms like this give an instant overview of a period that a researcher may find interesting. Whole days and different ACDs can quickly be compared while trends, anomalies and other features (such as sounds in the ultrasonic band) noticed at a glance. Researchers interested to explore more closely can then home in on these specific files and generate further close-up spectrograms of particular areas of interest. With several of these images lined up researchers can look longitudinally across the season or seasons, laterally across the day and spatially across however many boxes are making recordings.

Suspicious Noise in the Night

Shifting gear

I am in the thick of developing the sound installation for this project which will reveal some of the concepts behind our work and show some of the sounds that will be captured by the Simon Chapple’s sensor network. I’ll explain more about that in another post soon. Meanwhile, I’m taking a break from thinking about gel loudspeakers, compass and heading settings on mobile phones in order to say a little about my experience working with Simon’s Rasberry Pi, wireless, 192kHz audio beacon prototype earlier this year.

Simon lent me his prototype in order for me to hear what’s going on in my garden in late January and to run some noise profiling tests. I was keen to see if the small mammals that must live in the garden are interested and active around our compost heap. I dutifully positioned the sensor box where I hoped I’d hear mice and other mammals fighting over leftover potato peelings but sadly — as far as I can tell at least — nothing of the sort: no fights or mating rituals at this time of year. The absence of activity is useful, since it suggests that there has been plentiful food for small mammals to find earlier in the year/day and they’re not risking the wind, rain, snow and something new in the garden to get a late night snack. However, a largely quiet night means that the few moments of sonic event are all the more interesting and easy to spot.

A word on the tools I’m using

I’ve been using SoX to generate spectrograms of the 10-second audio clips collected. It’s a good way to quickly inspect if there is something of interest to listen to. With over 9 hours of material though, it’s not interesting to listen to the whole night again. Instead, I first joined all of the files together using a single command line from SoX in the terminal window on OS X:

sox *.wav joined.wav

I then generated a spectrogram of that very long .wav file. However, the resolution of a 9 hour file needs to be massive to give any interesting detail. Instead, I decided to group the files into blocks of an hour and then rendered a 2500 pixel spectrogram of each 10-second burst. It’s very quick to then scroll down through the images until something interesting appears. Here’s the .sh script I used:

### Generate for WAV
for file in *.wav; do
   sox "$file" -n spectrogram -t "$title_in_pic" -o "$outfile" -x 2500

The above script was hacked from this GitHub gist by hrywlms.

Something suspicious

From the rendered spectrograms, I can quickly see that there were some interesting things happening at a few points in the night and can zoom in further. For example this moment at 1:43 am looks interesting:

Something suspicious at 1:43 am

It sounds like this:

I suspect that this is a mouse, cat or rat. Anyone reading this got a suggestion as to what it might be?

As the night drew on, the expected happened and birds began to appear — the first really obvious call was collected at 6:23 am. It looks like this:

First bird call 6:23 am

And it sounds like this:


If you’re able to hear this audio clip, then you’ll be aware of the noise that is encoded into the file. One of our challenges going forwards is how to reduce and remove this. I’ve tried noise profiling and attempted to reduce the noise from the spectra, but this has affected the sounds we want to hear and interpret. Bascially by removing the noise, you also remove parts of the sound that are useful. I’m reflecting on this and think that there are ways to improve how electricity is distributed to the Rasberry Pi in Simon’s box from the battery and whether we need some USB cables with capacitors built in to stop the noise. However, noise reduction may not be as important to others as it is to me. My speciality is sound itself, in particular, how things sound, I want to get rid of as much unnecessary noise as possible so that when I’m noisy, it’s intentional. However, for an IoT project, the listener isn’t going to be a human but a computer. Computers will analyse the files, computers will detect if something interesting is happening and computers will attempt to work out what that interesting thing is based on things it’s been told to look for. It’s highly likely that the noise, which very quickly makes these sounds seem irritating and hard to engage with for a human, may well be perfectly fine for a computer to deal with. Let’s see.

Field-testing the Prototype Device

An earlier post described my initial steps in building an audio monitoring device, and over the last couple of weeks, I have worked on putting the electronics inside an enclosure that is both waterproof and will not be too obtrusive when installed in a tree. We refer to it as the “bird-box”. The box is made largely of 3mm plywood, with some thicker wood framing. It’s been stained and varnished to weatherproof it. The design enables easy separate access to change the battery without dislodging the Raspberry Pi Zero W processor and the Ultramic. On the inside, we use hermetically sealed plastic lunch boxes to hold the sensitive electronics, with sealed punch-throughs for the various connecting cables. It’s cheap and very effective.

Our next step was to carry out some field-testing of the device. We decided to do this in the private garden of a University of Edinburgh property, close enough to the Meadows to capture representative samples of sounds in the environment. I installed a temporary WiFi access point in the building to pick up the data from the prototype device in the garden, which is collected on a laptop also sited within the building.

The recording device placed outside for 72 hours in the garden

Here’s a small sample of what we recorded over the three days of wind, snow, rain and freezing temperatures. The unit performed well in these challenging conditions, including the 30,000 mAh power bank.

This audio sample is indicative of what kinds of things we can detect in the urban environment: an emergency siren in the background, a stonemason working on a nearby building, and a snatch of bird song. The spectrogram below illustrates the different frequency ranges at which the sounds occur, from 0kHz up to 20kHz.

  • The bottom pink line is ambient sound.
  • The faint wavey pink line above that is the siren.
  • The strong pink fence-like pattern above that is the sound of the stonemason tapping away.
  • Finally, the little pink burst (between 3kHz and 5kHz) just before the last two taps from the stonemason is the clearly-audible bird song.

Listen again whilst looking at the image and you can observe how the sounds interact with each other.

We are excited to see that the recording device, the WiFi router and the computer all seem to be working together well.