Acoustic and imagery sensing

Main contributors from the group: Wenjie Luo (topic coordinator), Qun Song, Chaojie Gu, Zhenyu Yan, Duc Van Le, Siyuan Zhou, Jiale Chen, Qiping (Joy) Yang

Image credit: Trajectory Magazine

The acoustic and imagery sensing modalities provide abundant information about the sensed target. As a result, the volume of an acoustic/imagery signal sample is often high and the data processing to extract useful information requires intensive computing. Such characteristics introduce various system challenges in data acquisition, computing, storage, and network transmission on networked sensing platforms with constrained processing capabilities, network bandwidth, and bounded energy supply.

Dr. Rui Tan’s early research designed efficient systems for 1) networked seismic sensing that introduces similar challenges [1, 2, 3] and 2) robotic fish’s visual sensing for monitoring aquatic debris [4] and harmful aquatic processes (e.g., oil spill and harmful algal blooms) [5]. The group is currently developing an acoustic echolocation system based on deep learning and networked imagery sensing for inspection tasks in manufacturing systems. This page briefly describes a completed preliminary work of using inaudible echos for room recognition.

Room recognition using inaudible echos

Location awareness is increasingly needed by mobile applications. As of November 2017, 62% of the top 100 free Android Apps on Google Play require location services. While GPS can provide outdoor locations with satisfactory accuracy, determining indoor locations has been a hard problem. Our work designs a practical room-level localization approach for off-the-shelf smartphones using their built-in audio systems only. Room-level localization is desirable in a range of ubiquitous computing applications. For instance, in a hospital, knowing which room that a patient is in is important to responsive medical aid when the patient develops an emergent condition (e.g., falling in a faint). In a museum, knowing which exhibition chamber that a tourist is in can largely assist the automation of her multimedia guide that is often provided as a mobile App nowadays. In a smart building, the room-level localization of the residents can assist the automation of illumination and air conditioning to improve energy efficiency and occupant comfort.

Active acoustic sensing

In our approach, a smartphone uses its loudspeaker to transmit a 2-millisecond narrowband acoustic signal at a frequency (e.g., 20kHz) beyond human’s hearing limit and uses its microphone to capture the reverberation from the indoor environment (typically, a room) for 100 milliseconds. Based on the spectrogram of the 100-millisecond reverbnation, our approach can recognize the indoor environment via a machine learning algorithm. The short audio recording time (i.e., 100 milliseconds) helps preserve the user’s privacy. However, the environment’s response to such a short-term and band-limited acoustic excitation may contain limited information about the environment. To address this, we employ deep learning to train a convolutional neural network for the room recognition.

We conducted extensive experiments to evaluate our approach. Results show 99.7%, 97.7%, 99%, and 89% accuracy in differentiating 22 and 50 residential/office rooms, 19 spots in a quiet museum, and 15 spots in a crowded museum, respectively. The below figures show several example room types and university tutorial rooms having similar appearance that are employed in our evaluation experiments, as well as the 19 and 15 spots in the two different museums that we experimented with. The results were published on ACM Ubicomp’18 (PDF).

Examples of several room types

    
image alt
 
image alt

Imagery sensing for manufacturing system

The group is developing a deep neural network-based imagery sensing system and evaluating it in a manufacturing production line. More information will be given later.

Examples of similar rooms

Rescuing speech recognition from microphone heterogeneity

Run-time domain shifts from training-phase domains are common in sensing systems designed with deep learning. The shifts can be caused by sensor characteristic variations and/or discrepancies between the design-phase model and the actual model of the sensed physical process. In the cyber-physical sensing applications, both the monitored physical processes and the sensing apparatus are often governed by certain first principles. In this paper, we investigate the approach to exploit such first principles as a form of prior knowledge to reduce the demand on target-domain data for model transfer. We propose a new approach called physics-directed data augmentation (PhyAug). Specifically, we use a minimum amount of data collected from the target domain to estimate the parameters of the first principle governing the domain shift process and then use the parametric first principle to generate augmented target-domain training data. Finally, the augmented target-domain data samples are used to transfer or retrain the source-domain DNN.

In this paper, we apply PhyAug two acoustic sensing applications and quantify the performance gains compared with other transfer learning approaches. The first and the second case studies aim to adapt DNNs for keyword spotting (KWS) and automatic speech recognition (ASR) respectively to individual deployed microphones. The domain shifts are mainly from microphone’s hardware characteristics. Our tests show that the microphone can lead to 15% to 35% absolute accuracy drops, depending on the microphone quality. Instead of collecting training data using the target microphone, PhyAug uses a smartphone to play a 5-second white noise and then estimates the frequency response curve of the microphone based on its received noise data. Then, using the estimated curve, the existing samples in the factory training dataset are transformed into new training data samples, which are used to transfer the DNN to the target domain of the microphone by a retraining process. Experiment results show that PhyAug recovers the microphone-induced accuracy loss by 53%-72% and 37%-70% in KWS and ASR, respectively. PhyAug also outperforms the existing approaches including FADA [13] that is a domain adaptation approach based on adversarial learning and Mic2Mic [10] and CDA [11] that are designed specifically to address microphone heterogeneity. Note that KWS and ASR differ significantly in DNN model depth and complexity. Specifically, we use DeepSpeech2 ASR model, it has 86.6 million weights, which is 175 times larger than the KWS CNN in terms of the weight amount.

PhyAug for Speech Recognition Rescuing

Bibliography

Our research

[1] Quality-driven Volcanic Earthquake Detection using Wireless Sensor Networks. Rui Tan, Guoliang Xing, Jinzhu Chen, Wen-Zhan Song, Renjie Huang. The 31st IEEE Real-Time Systems Symposium (RTSS), pp. 271-280, Nov 30 - Dec 3, 2010, San Diego, CA, USA.
[2] Volcanic Earthquake Timing using Wireless Sensor Networks. Guojin Liu, Rui Tan; Ruogu Zhou; Guoliang Xing; Wen-Zhan Song; Jonathan M. Lees. The 12th ACM/IEEE Conference on Information Processing in Sensor Networks (IPSN), April 8-11, 2013, Philadelphia, PA, USA. CPS Week 2013. (The first two authors are listed in alphabetic order.)
[3] ORBIT: A Smartphone-Based Platform for Data-Intensive Embedded Sensing Applications. Mohammad-Mahdi Moazzami, Dennis E. Phillips, Rui Tan, Guoliang Xing. IEEE Transactions on Mobile Computing (TMC). Vol. 16, No. 3, pp. 801-815, March 2017.
[4] Aquatic Debris Monitoring Using Smartphone-Based Robotic Sensors. Yu Wang, Rui Tan, Guoliang Xing, Jianxun Wang, Xiaobo Tan, Xiaoming Liu, Xiangmao Chang. The 13th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), April 15-17, 2014, Berlin, Germany.
[5] Samba: A Smartphone-Based Robot System for Energy-Efficient Aquatic Environment Monitoring. Yu Wang, Rui Tan, Guoliang Xing, Jianxun Wang, Xiaobo Tan, Xiaoming Liu. The 14th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), April 13-17, 2015, Seattle, WA, USA.
[6] Deep Room Recognition Using Inaudible Echos. Qun Song, Chaojie Gu, Rui Tan. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT). The ACM International Joint Conference on Pervasive and Ubiquitous Computing (Ubicomp), October 8-12, 2018, Singapore.