Jump to content

Switchboard Telephone Speech Corpus

From Wikipedia, the free encyclopedia

The Switchboard Telephone Speech Corpus is a corpus of spoken English language consisted of almost 260 hours of speech. It was created in 1990 by Texas Instruments via a DARPA grant, and released in 1992 by NIST. The corpus contains 2,400 telephone conversations among 543 US speakers (302 male, 241 female).[1][2][3] Participants did not know each other, and conversations were held on topics from a predetermined list.[4]

Switchboard-2 Phase II was collected in 1999 and includes "4,472 five-minute telephone conversations involving 679 participants".[5]

The corpus was used for development of speech recognition algorithms.[6]

Text example:[7]

A: All right um well [laughter-uh] let's see i'm twenty
B: How old are you Lisa. Okay that i'm older
A: Yeah how old are you. Older [laughter]
B: Older than you [laughter-are]
A: [laughter-okay]
B: Okay we are supposed to talk about places we like to go so i'm gonna and where are you from where are you calling from?
A: I'm calling from uh Provo Utah but I'm from Plano Texas
B: Oh you are from Plano my sister lives in Plano yes her husband is the new Director of Admissions at uh University of Texas at Dallas
A: Oh really. Oh wow my dad used to work at UTD also
B: Yeah so I [vocalized-noise]. Anyway so where's your favorite place to go?
A: Um. Generally we just go on family vacations to Arizona my grandparents live there that's generally our usual summer vacation

Further reading

[edit]
  • Calhoun, Sasha; Carletta, Jean; Brenier, Jason M.; Mayo, Neil; Jurafsky, Dan; Steedman, Mark; Beaver, David (December 2010). "The NXT-format Switchboard Corpus: a rich resource for investigating the syntax, semantics, pragmatics and prosody of dialogue" (PDF). Language Resources and Evaluation. 44 (4): 387–419. doi:10.1007/s10579-010-9120-1. S2CID 5176936. Retrieved 26 January 2024.

References

[edit]
  1. ^ "Switchboard-1 Release 2 - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
  2. ^ "Papers with Code - Switchboard-1 Corpus Dataset". paperswithcode.com. Retrieved 26 January 2024.
  3. ^ Godfrey, John J.; Holliman, Edward C.; McDaniel, Jane (23 March 1992). "SWITCHBOARD: Telephone speech corpus for research and development". [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing. IEEE Computer Society. pp. 517–520. doi:10.1109/ICASSP.1992.225858. ISBN 0-7803-0532-9. S2CID 61412708. Retrieved 26 January 2024.
  4. ^ "NXT Swbd Overview". groups.inf.ed.ac.uk. Retrieved 26 January 2024.
  5. ^ "Switchboard-2 Phase II - Linguistic Data Consortium". catalog.ldc.upenn.edu. Retrieved 26 January 2024.
  6. ^ "Switchboard Transcription System". www1.icsi.berkeley.edu. Retrieved 26 January 2024.
  7. ^ Soni, Mayank; Spillane, Brendan; Gilmartin, Emer; Saam, Christian; Cowan, Benjamin R.; Wade, Vincent (2021). "An Empirical Study of Topic Transition in Dialogue". arXiv:2111.14188 [cs.CL].