The new computational method is able, for example, to analyze thousands of sound sources in an urban environment or war zone. The resulting visualizations allow the user to scan an audio recording at 200 times that of real-time, enabling them to discover unexpected, or anomalous, events.
Details of the work are reported in the journal Pattern Recognition Letters.
Project co-leader Mark Hasegawa-Johnson, a researcher at the Beckman Institute at the University of Illinois, says the software is designed to free up the analyst by having the computer perform certain tasks, and render the data visually, such as with a spectrogram.
“The idea is to let the computer do what computers are good at and have the humans do what humans are good at,” Hasegawa-Johnson says. “So humans are good at inference, big picture, and anomaly detection. Computers are really good at processing hundreds of hours of data all at once and then compressing it into some format, into some image.”
In order to turn sound into an image, the researchers developed an efficient algorithm for simultaneously computing Fast Fourier Transforms (FFTs), a common computing method. The method employs efficient, simultaneous multiscale computation of FFTS at multiple “window” sizes. The windows contain frequency information that gives a specific snapshot of the input signal. To test the method, they applied the technology to an audio book.
“If you try to skim an audio book, if you try to speed it up by four times, you really can’t understand what it’s saying most times,” Hasegawa-Johnson explains. “But if you take the entire thing and plot it as a spectrogram you can actually plot it as some kind of signal summary of the entire three hours and get some information from one screen of data. From that one screen of data you can figure out what in the three hours you want to zoom into.”
The audio visualization research is part of a project funded by the National Science Foundation and Department of Homeland Security. The Illinois researchers have dubbed it “milliphone” because it represents turning a thousand sources of audio into a single visualization.
More news from the University of Illinois: www.beckman.illinois.edu
No comments:
Post a Comment