|
|
../ ANDOSL - Australian National Database of Spoken Language |
| Developed by: |
 |
School of Electrical Engineering Sydney University, the National Acoustic Laboratories, the Speech Hearing and Language Research Centre Macquarie University and the Computer Sciences Laboratory Australian National University |
| Size: |
|
|
| Contents: |
|
"The spoken language data which comprises the first phase of the ANDOSL project was elicited either by written material which was read aloud or by graphical material which was discussed by two speakers thereby generating spontaneous speech." |
| Access: |
|
Data is available for purchase |
| Notes: |
|
Last update was in 1998. |
| |
|
 |
|
|
|
../ BAS corpora |
| Developed by: |
 |
BAS |
| Size: |
|
|
| Contents: |
|
The BAS corpora vary in size and content. Available corpora range from read to spontaneous speech to telephone speech and more. |
| Access: |
|
Listen to example audio files |
| Notes: |
|
|
| |
|
 |
|
|
|
../ BASE - British Academic Spoken English Corpus |
| Developed by: |
 |
Universities of Warwick and Reading, UK |
| Size: |
|
160 lecture and 40 seminar recordings (info recorded Jan. 2005) |
| Contents: |
|
Digital Video Recordings of seminars and lectures across different disciplines |
| Access: |
|
the project is still under development |
| Notes: |
|
BASE is developed as a companion to the MICASE corpus |
| |
|
 |
|
|
| ../ BNC Spoken Corpus |
| Developed by: |
 |
Universities of Warwick and Reading, UK |
| Size: |
|
160 lecture and 40 seminar recordings (info recorded Jan. 2005) |
| Contents: |
|
Digital Video Recordings of seminars and lectures across different disciplines |
| Access: |
|
the project is still under development |
| Notes: |
|
BASE is developed as a companion to the MICASE corpus |
| |
|
 |
|
|
| ../ CANCODE - Cambridge and Nottingham Corpus of Discourse in English |
| Developed by: |
 |
Cambridge University Press |
| Size: |
|
5 million words |
| Contents: |
|
spoken materials; sponaneous speech only; recordings collected between 1995 and 2000 |
| Access: |
|
Access is currently restricted to members of Cambridge University Press |
| Notes: |
|
"[A]ll the recordings have been coded according to the relationship between the speakers." |
| |
|
 |
|
|
| ../CHRISTINE Corpus |
| Developed by: |
 |
Cambridge University Press |
| Size: |
|
5 million words |
| Contents: |
|
spoken materials; sponaneous speech only |
| Access: |
|
Access is currently restricted to members of Cambridge University Press |
| Notes: |
|
"[A]ll the recordings have been coded according to the relationship between the speakers." |
| |
|
 |
|
|
|
|
../ CPSA - Corpus of Spoken American English |
| Developed by: |
 |
Michael Barlow, Athelstan |
| Size: |
|
2 main sub-corpora, 1 million words each |
| Contents: |
|
short interchanges by 400 speakers - professional activities broadly tied to academics and politics; materials recorded between 1994 and 1998 |
| Access: |
|
Registered users only ($49 for the individual license) |
| Notes: |
|
The CPSA is also available tagged. |
| |
|
 |
|
|
|
../ CSLU Spoken Corpora |
| Developed by: |
 |
Center for Spoken Language Understanding |
| Size: |
|
|
| Contents: |
|
The CSLU hosts a number of different spoken corpora (English and other languages) ranging from telephone conversations to spontaneous utterances from children. |
| Access: |
|
Available for ordering |
| Notes: |
|
|
| |
|
 |
|
|
|
../ DCPSE - Diachronic Corpus of Present-day Spoken English |
| Developed by: |
 |
Department of English (Survey of English Usage), University College London |
| Size: |
|
|
| Contents: |
|
"The project aims to construct a fully parsed and searchable diachronic corpus of spontaneous spoken English, containing carefully selected and directly comparable texts from the LLC and ICE-GB corpora."
|
| Access: |
|
Available for purchase (Student Licence 25 GBP) |
| Notes: |
|
|
| |
|
 |
|
|
|
|
| ../ FRED - Freiburg English Dialect Corpus |
| Developed by: |
 |
Bernd Kortmann, English Department, University of Freiburg |
| Size: |
|
2.5 million words, 300 hours of recorded speech |
| Contents: |
|
"... 372 interviews with male and female speakers from 163 different locations in 43 different countries in 9 major dialect areas." |
| Access: |
|
Sample texts and audio files available online |
| Notes: |
|
Detailed documentation available. |
| |
|
 |
|
|
| ../ IViE Corpus - English Intonation in the British Isles |
| Developed by: |
 |
Phonetics Laboratory, University of Oxford |
| Size: |
|
36 hours of speech data |
| Contents: |
|
Modern or mainstream dialects; Recordings from London, Cambridge, Cardiff, Liverpool, Bradford, Leeds, Newcastle, Belfast in Northern Ireland and Dublin in the Republic of Ireland |
| Access: |
|
Available for download free of charge; Online search through web interface |
| Notes: |
|
|
| |
|
 |
|
|
| ../ LCIE - Limerick Corpus of Irish-English |
| Developed by: |
 |
University of Limerick in conjunction with Mary Immaculate College, Limerick |
| Size: |
|
1 million words; 375 transcripts |
| Contents: |
|
Recorded conversations: casual, professional, transactional, pedagogical conversations |
| Access: |
|
Under development |
| Notes: |
|
|
| |
|
 |
|
|
|
| ../ LLC - London-Lund Corpus of Spoken English |
| Developed by: |
 |
Department of English, Lund University, Sweden |
| Size: |
|
~ 500,000 words |
| Contents: |
|
spoken British English; 100 texts; date of materials ranges from 1959 - 1975 |
| Access: |
|
available on the ICAME CD-ROM; check out the online sample |
| Notes: |
|
The LLC is the result of two projects: SEU (1959) at University College London and SSE at Lund University in 1975. |
| |
|
 |
|
|
|
../ Longman Spoken American Corpus |
| Developed by: |
 |
Longman Corpus Network |
| Size: |
|
5 million |
| Contents: |
|
"everyday conversations of more than 1000 Americans of various age groups, levels of education, and ethnicity, and includes speakers from over 30 US States" |
| Access: |
|
Currently restricted to members of Longman Corpus Network |
| Notes: |
|
|
| |
|
 |
|
|
|
../ LUCY - Structure in Written English in the UK |
| Developed by: |
 |
Geoffrey Sampson |
| Size: |
|
165,000 words |
| Contents: |
|
written British English ('polished', young adult and child writing - imaginative and informative); 239 text files, each sample ~2000 words |
| Access: |
|
Free download |
| Notes: |
|
Structurally annotated - LUCY is a treebank |
| |
|
 |
|
|
|
../ MICASE |
| Developed by: |
 |
English Language Institute at University of Michigan, U.S. |
| Size: |
|
190 hours were recorded (152 transcripts, approx. 1.7 words) |
| Contents: |
|
Academic speech (e.g. lectures, colloquia, study groups, etc.); The recordings were made during the period 1997-2001. |
| Access: |
|
Free access to the online version of the corpus; offline versions of the texts, both tagged and untagged, are available for sale; 70 sound files available online as Real Audio; MP3-files on CD-Rom will be available for purchase soon. |
| Notes: |
|
|
| |
|
 |
|
|
|
../ MARSEC - Machine-Readable Spoken English Corpus |
| Developed by: |
 |
School of Linguistics, Reading University |
| Size: |
|
|
| Contents: |
|
"The Marsec corpus of spoken standard southern British English is a development of the Lancaster/IBM spoken English corpus (SEC)." The acoustic recordings and word-level-time alignments are available. |
| Access: |
|
Available for 200 GBP. |
| Notes: |
|
|
| |
|
 |
|
|
|
|
../ TRAINS Dialogue Corpus |
| Developed by: |
 |
Conversational Interaction and Spoken Dialogue Research Group |
| Size: |
|
|
| Contents: |
|
Problem solving dialogues |
| Access: |
|
Transcriptions of the dialogues are available |
| Notes: |
|
|
| |
|
 |
|
|
|
| |