Speech Signal Processing
Automatic speech recognition(ASR) is the task of taking an speech utterance as an input and converting it into a text. ASR is a core technique of human-machine interface system. Application of ASR is interface of smart home, phone, and so on. ASR can be categorized into two fields.
Speech synthesis is a technique of synthesizing a given text input into speech. Speech synthesis is actively used in various fields to convey information to the users through speech from machines such as smartphone interface, personal assistant in a vehicle, ARS, robot interface, etc..
Speech enhancement improves the degraded speech intelligibility and quality by using audio signal processing techniques and algorithms. Speech enhancement plays an important role in post-processing of audio signal processing area. Application of speech enhancement is mobile and telecommunication systems, hearing aids, ASR, and so on. Speech enhancement can be categorized into three fields.
Speech coding is an application of data compression of digital audio signals containing speech. The purpose of speech compression is to reduce the number of bits required to represent speech signals in order to minimize the requirement for transmission bandwidth or to reduce the storage costs.
AI & Machine Learning
Artificial intelligence (AI) deals with implementing algorithms that allow machine to perform tasks which were considered as human activity. Research areas of AI include automatic speech recognition (ASR), machine hearing, dialogue systems, auditory scene understanding related to speech and audio signal processing.
Machine learning is a type of artificial intelligence that provides computers with the ability to learn or analysis data without being explicitly programmed.
Speech/audio signal processing is a field which retrieves and classifies/estimates information from an audio signal depending on its purpose. Currently, speech/audio signal processing is a leading field which utilizes machine learning techniques.
Our laboratory has been researching the applications of machine learning such as HMM and SVM to speech/audio signal processing. In recent years, our research has been focused on applications and development of state-of-the-art techniques such as DNN and NMF.
Audio Signal Processing
When a musical instrument is played, the sound radiated from the instrument is affected by various elements before it is perceived by the listener, including the directivity patterns of the musical instrument and the room impulse response of the concert hall. Applications of spatial sound reproduction include 3D realistic audio systems, analysis of acoustic wave path, and so on. This research explains the method for reproducing the spatial sound using a recorded anechoic sound source and a measured room impulse response.
Audio scene recognition aims to the computational analysis of an acoustic environment and the recognition of distinct sound events in it. The focus of audio scene recognition is in recognizing the context or environment, and analyzing discrete sound events.
Acoustic localization is an area that studies localization or tracking of acoustic sources or target microphones based on the various measurements of the sound field. It can be applied to various location based services such as the virtual tour guide in museums, tracking of personal items, or the guide in shopping malls.
Sound code is a wireless data transmission system by encoding data using an audio data hiding technology and playing through a loudspeaker. It can be applied to inserting product information, distributer information, coupon codes, etc. into advertisements.
Audio signal usually consists of the target sound and background noise. Audio source separation is a technique of extracting a single or several signals of interest from a mixture signals. It removes unwanted components from a recording.