SUPERB: Speech processing Universal PERformance Benchmark — Waseda University

SUPERB: Speech processing Universal PERformance Benchmark

Shu Wen Yang, Po Han Chi*, Yung Sung Chuang*, Cheng I.Jeff Lai*, Kushal Lakhotia*, Yist Y. Lin*, Andy T. Liu*, Jiatong Shi*, Xuankai Chang, Guan Ting Lin, Tzu Hsien Huang, Wei Cheng Tseng, Ko Tik Lee, Da Rong Liu, Zili Huang, Shuyan Dong, Shang Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung Yi Lee

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

117 Citations (Scopus)

Abstract

Self-supervised learning (SSL) has proven vital for advancing research in natural language processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on large volumes of unlabeled data and achieves state-of-the-art (SOTA) for various tasks with minimal adaptation. However, the speech processing community lacks a similar setup to systematically explore the paradigm. To bridge this gap, we introduce Speech processing Universal PERformance Benchmark (SUPERB). SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data. Among multiple usages of the shared model, we especially focus on extracting the representation learned from SSL for its preferable re-usability. We present a simple framework to solve SUPERB tasks by learning task-specialized lightweight prediction heads on top of the frozen shared model. Our results demonstrate that the framework is promising as SSL representations show competitive generalizability and accessibility across SUPERB tasks. We release SUPERB as a challenge with a leaderboard1 and a benchmark toolkit2 to fuel the research in representation learning and general speech processing.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages3161-3165
Number of pages5
ISBN (Electronic)9781713836902
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 2021 Aug 302021 Sept 3

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume4
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period21/8/3021/9/3

Keywords

  • Benchmark
  • Evaluation
  • Model generalization
  • Representation learning
  • Self-supervised learning
  • Speech

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'SUPERB: Speech processing Universal PERformance Benchmark'. Together they form a unique fingerprint.

Cite this