Skip to main content

Classify Insects on the Fly

The history of humankind is intimately connected to insects. Insect-borne diseases kill millions of people and destroy tens of billions of dollars’ worth of crops annually. At the same time beneficial insects pollinate the majority of crop species that we eat. Given the importance of insects in human affairs, it is surprising that computer science has not had a larger impact in entomology.

Yanping believes that recent advances in sensor technology and machine learning techniques are about to change this. In particular, she proposed to use inexpensive sensors to capture the flying sounds of insects, and to create a robust classifier to automatically and accurately classify flying insects based on the sounds.

The idea of automatically classifying insects using the incidental sound of their flight dates back to the very dawn of computers and commercially available audio recording equipment 3. However, little progress has been made on this problem in the intervening decades. She feels that the lack of progress in this pursuit can be attributed to three related factors: the lack of effective sensors makes data collection difficult, resulting in poor-quality and limited data; the poor quality data results in the inaccurate modeling of classifiers; classifiers built based on the limited data tend to be over-fitting and not well generalized to unknown insects classification.

Yanping has largely solved all these problems. She uses optical sensors (designed in his lab) to record the “sound” of insect flight from up-to meters away, with invariance to interference from wind noise and ambient sounds. These sensors have allowed her to record on the order of millions of labeled training instances, far more data than all previous efforts combined, and allowed her to avoid the over-fitting that has plagued previous research efforts. By taking advantage of “The unreasonable effectiveness of data” 2 with the enormous amounts of data collected, she has built a simple, accurate and robust classification framework.

Yanping has made all code, data, and supplemental materials freely available at her webpage 1, and will give sensors for free to any researcher interested in it. With the inexpensive sensors and the robust software, her work will provide researchers worldwide robust tools to accelerate their research.

1 Chen Y, project webpage:
2 Halevy A, Norvig P, Pereira F (2009) The Unreasonable effectiveness of data, IEEE Intelligent Systems, v.24 n.2, p.8-12
3 Kahn MC, Celestin W, Offenhauser W (1945) Recording of sounds produced by certain disease-carrying mosquitoes. Science 101: 335–336