C4DM seminar – Applying Automatic Speech Recognition (ASR) Technologies to Singing (Anna Kruspe, Fraunhofer IDMT)
In the past 15 years, the field of Music Information Retrieval (MIR) has produced a multitude of interesting approaches for the analysis of various characteristics of music. However, one aspect that has not received much attention so far is the lyrical content of singing voices. This talk will provide an overview of my work on the application of Automatic Speech Recognition (ASR) technologies to singing.
I will focus on three topics:
- Language Identification is the task of detecting the language of a singing recording
- Keyword Spotting is concerned with finding certain keywords in a collection of music
- Lyrics-to-singing alignment provides ways to attach known textual lyrics to singing recordings
I will give short overviews of the state of the art of these topics, present my own approaches (including two practical demonstrations), and talk about the current possibilities and limitations in this field of research.
Anna Kruspe is a researcher and Ph.D. student at Fraunhofer Institute for Digital Media Technology (IDMT) in Ilmenau, Germany. In addition to her Ph.D. research on speech recognition for singing, she is currently working on projects about automatic speech/music discrimination, recommendation engines, and music classification for various industry clients. She received her Master’s degree in Media Technology from Ilmenau Technical University (TU Ilmenau) in 2011. Her final thesis was on the topic of classification of musical pieces into global cultural areas and was supervised by Prof. Karlheinz Brandenburg.
Prior to this, she worked on various speech recognition tasks as an intern at Toshiba Research Europe Ltd. For her final thesis, Anna Kruspe received the Hugo Geiger Prize of the Fraunhofer Society. Her Ph.D. work is funded through a scholarship of the Fraunhofer Society. From 2013 to 2014, she was an exchange researcher at the Center for Language and Speech Processing (CLSP) of the Johns Hopkins University and was advised by Prof. Hynek Hermansky. This research stay was funded through a Fulbright scholarship.