Signal Processing Taking Wing: A BRP Intern’s Perspective

The summer of 2018 was delightful and carefree. Long hours spent reading in the sun, dining out, and vaguely dreaming of grad school in the spring. However, inevitably, I began to feel restless. I was eager to continue exploring my chosen major: signal processing.

Signals are numbers changing with respect to a parameter – most commonly, time. Signals are everywhere we look. If you’ve ever recorded something on Audacity, FL Studio, or even on Whatsapp, you would have seen certain waveforms — they’re the variations of air pressure with respect to time. Just like music, noise is also a variation in air pressure and hence, a signal. The electrical activity of the heart, which causes the muscles to pump blood, is a signal. Different regions of your brain are constantly sparking with neural activity, which can be represented as a signal. Signal processing involves converting signals to a convenient form and applying various transformations such as filtering for easy interpretation.

Signals are interesting and everywhere! My experience in signal processing up to then was in human speech and I was applying to intern in voice technology companies. Somehow, a recurring thought held me back: when next will I have six months to do absolutely whatever I want? Why not broaden my perspective? After all, it was for this very reason that I involved myself in learning Spanish and Ultimate Frisbee. Why not another first?

I was flitting between LinkedIn pages, as these things often go, when my eyes suddenly rested on the word ‘Bioacoustics.’ I had no idea what that meant. Wikipedia defined it as “…a cross-disciplinary science that combines biology and acoustics…investigation of sound production, dispersion, and reception in animals.” A cross-disciplinary field, which was interesting, had significant impact, and overlapped with my realm of interest to pose a whole new world of learning and possibilities – that was precisely what I didn’t know I was really searching for! By extension, bioacoustics led to discovering the Bioacoustics Research Program a.k.a. BRP, which is a wing (pun intended) of the Cornell Lab of Ornithology.

Going through BRP’s website, I marveled at how each project needed insight from different disciplines. A field like bioacoustics isn’t just benefited by mutual collaboration: it demands it. Here’s a glance at the process, from my own understanding —

  • Ecologists, wildlife experts, and research analysts who possess domain knowledge have to identify and describe the technology that meets their requirements for projects like soundscape analysis and acoustic monitoring.
  • Hardware engineers and software developers have to conceptualize and program the required devices. BRP, and apparently, several other bioacoustics labs, have designed customized recording devices to collect data from birds, whales, elephants, etc. You can check out one such camera/recorder – the live feeder watch camera at the Lab.
  • Now, the recording devices do their job and collect data 24/7. Here’s an idea of how much data that is: 24 hours/day * ~30 days/month = 720 hours/month. For those more familiar with signal processing, there are 30 recorders stationed throughout Sapsucker Woods. Each recorder is collecting a single channel of data with a sampling frequency of 48 kHz. That adds up to 48*1000 samples/channel-second * 3600 seconds/hour * 1-channel/recorder * 30 recorders = 5,184,000,000 samples/hour. So that means we have 5,184,000,000 samples/hour * 720 hours/month, meaning that just for one month, the team is looking at an array containing ~3.7 trillion numbers, which is…huge.
  • Data is collected for different purposes. The specific application that I was involved in was the bird sound recognition system. Here’s where deep learning comes into play, which takes data to classify a recording as a certain species. Now you know the amount of data that it has to process!
Acoustic data collection array at the Cornell Lab of Ornithology
Acoustic data collection array at the Cornell Lab of Ornithology

This is exactly what I am working to tackle. We could just feed this enormous amount of raw data into a system and say, “Deal with it!” Alternatively, we could streamline the data going into the recognizer to make its work more meaningful. Specifically: say there is a 1 hr long segment, containing only the sound of a river rushing. Or, another sound segment of just two people conversing in the woods. It is obvious, when we humans hear such recordings, that there is no useful (bird-like) information. However, the recognizer performs intricate computations only to assign a label of a particular avian species to the recording, which is misleading, and more importantly, meaningless. If we could split this recorded data into chunks and keep only those chunks that are ‘very likely to contain bird calls,’ it would be a crucial preprocessing step. Given a clip, if it can be said with reasonable confidence to not contain any bird sound – it could be noise, silence, or human speech – then it won’t have to be passed through the recognition algorithm at all. My task was to step in before the deep learning to take care of preprocessing.

Since July 2018, I have been working remotely with Dr. Shyam Kumar Madhusudhana, a post-doctoral fellow in BRP. We turned out to have several things in common: we had lived in Bangalore and Goa and had backgrounds in speech processing. We got along well, and have been communicating over video call, email, and GitHub. He has guided me throughout my first remote internship experience, as we’ve analyzed acoustic data from birds. There are definitely several similarities between analyzing speech and bird sounds (or, to sound fancy, avian vocalizations). Both are converted to the frequency domain, as a result of which they are analyzed in chunks rather than the whole signal at once.

I definitely learned a lot. I learned more about signal processing, as we experimented with various filterbanks, such as mel and gammatone. I went through previous literature in the field to get a sense of work done for this problem statement. Not to mention programming skills! I considered myself to be a decent programmer, but I was in for a surprise. Shyam has been a software developer for many years, and he pointed out several ways (so many) in which my code could be improved. My biggest takeaway from this internship is to write more readable code and use vectorization wherever possible. I also learned the importance of version control and being organized.

Another important lesson was not to get too caught up with numbers; if numbers stayed only as numbers, we risked restricting ourselves to a world of abstraction. Data visualization is essential, and I used the statistical data visualization library, seaborn, to add more meaning (and color, who doesn’t like colorful plots?) to my work. Here are some ‘scatterplot matrices’ created with seaborn.

Scatterplot of 4 acoustic classes of bioacoustic data
Scatterplot of 4 acoustic classes of bioacoustic data

The above plot represents sounds from four different classes – bird sounds, human speech, noise, and a ‘maybe’ class which may or may not contain bird sounds. For each recording, an ‘acoustic index’ is employed to extract information from the frequency domain. Five such acoustic indices, drawing from previous work, were computed and plotted in matrix form.

Acoustic indices for 10 different recordings
Acoustic indices for 10 different recordings

Above is another matrix, this time from 10 random recordings of noise and bird calls off the Internet. An interesting application of visualization in this domain is in identifying different species of birds via their calls. Each call has a distinct spectral signature, and I learned that birders often associate this distinct visual pattern with calls to better commit the sound to their memory. Watch this video to know more in detail and try out a fun game. I often found myself listening to different bird sounds, like this video. It’s quite fascinating, even if you’re not into birding, to think about the kind of vocal production system they must possess to create such sounds.

My definition of personal satisfaction would be in creating social impact through my work. I had always been under the impression that ‘social impact’ and my ‘profession’ would be mutually exclusive; that it would be something I did on my own, and on the side, because I wouldn’t find such a niche in the domain of signal processing and engineering. I am grateful for finding that niche and having the experience I had. I hope my run with BRP continues – I am eager to meet the whole team and explore Sapsucker Woods someday!