home complete list a - z parallel corpora learner corpora historical corpora spoken corpora ice corpora more languages german corpora english corpora search
You are now in section: CorporaHistorical Corpora
     
 
ARCHER CMEP MEMEM
Brooklyn corpus Corpus del Espanol Newdigate
CEEC HCOS PPCME2
CEECS HCET YCOE
CME Historical Welsh  

../ ARCHER - American Representative Corpus of Historical English Registers
Developed by: Original coordinator: Prof. Douglas Biber at the University of Northern Arizona in Flagstaff; currently cooperating with Prof. Mair and Kortmann from University of Freiburg
Size:   1.7 millon words
Contents:   various textual genres in British and American English; ~ 1650s - ?
Access:    
Notes:    
    to the top of the page

../ The Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English
Developed by: Joint project of five linguists
Size:   106,210 words
Contents:   selection of texts from the Old English section of the Helsinki Corpus of English texts
Access:   free access for research purposes; fill out request from
Notes:   The developers of the Brooklyn corpus now recommend to use the YCOE.
    to the top of the page

../ Corpus of Early English Correspondence (CEEC)
Developed by: Sociolinguistics and Language History project team at the Department of English, University of Helsinki
Size:   2.7 million words
Contents:   Personal letters written in England between c.1410 and 1680
Access:   Restricted to research; refer to CEECS for general access
Notes:   Since 2000 work has been under way to extend the corpus to the eighteenth century. Tagging of the corpus was finished in 2004 and parsing is continuing into 2005.
    to the top of the page

../ Corpus of Early English Correspondence Sampler (CEECS)
Developed by: Sociolinguistics and Language History project team at the Department of English, University of Helsinki
Size:   450,000 words
Contents:   The CEECS contains the non-copyrighted materials included in the original corpus
Access:   available on the ICAME CD-Rom; check out the online sample
Notes:    
    to the top of the page

../ CME - Corpus of Middle English Prose and Verse
Developed by: HTI - University of Michigan, U.S.
Size:   54 texts
Contents:   Collection of Middle English texts provided by the University of Michigan and the Oxford Text Archive
Access:   Free; Search possible in individual or groups of books; Conduct simple/boolean/proximity searches
Notes:   SGML Markup according to the TEI guidelines
    to the top of the page

../ Corpus of late Modern English Prose
Developed by: Department of English and American Studies, University of Manchester
Size:   ~ 100,000 words
Contents:   informal private letters by British writers; 1861 to 1919
Access:   Freely available through the Oxford Text Archive
Notes:    
    to the top of the page

../ Corpus del Español
Developed by: Mark Davies, Brigham Young University
Size:   100 million words
Contents:   heterogeneous; corpus includes fictional, non-fictional and oral texts; 1200s to 1900s
Access:   An elaborate online search is available that allows complex search patterns in individual parts of the corpus or in the entire corpus
Notes:   This service is freely available.
    to the top of the page

../ Helsinki Corpus of Older Scots (HCOS)
Developed by: Anneli Meurman-Solin from Sociolinguistics and Language History project team at the Department of English, University of Helsinki
Size:    
Contents:   1450 to 1700
Access:   available on the ICAME CD-Rom; check out the online sample
Notes:   Bibliography online at ICAME
    to the top of the page

../ Helsinki Corpus of English Texts
Developed by: Department of English, University of Helsinki
Size:    
Contents:   "A multi-genre corpus, with a thousand-year timespan from the earliest documents to 1710."
Access:   The Diachronic part is available on the ICAME CD-Rom; check out the online sample; also available via the Oxford Text Archive
Notes:    
    to the top of the page

../ A Historical Corpus of the Welsh language 1500-1850
Developed by: Department of Linguistics, University of Cambridge
Size:   420,000 words
Contents:   heterogeneous; 30 texts, sample size ~15,000 words; 1500-1850
Access:   The texts are available for download; An online search function is also provided
Notes:   XML encoded
    to the top of the page

../ MEMEM - Michigan Early Modern English Materials
Developed by: Richard W. Bailey, Jay L. Robinson, James W. Downer, with Patricia V. Lehman.
Size:   50,000 records
Contents:   "The Materials consist of citations collected for the modal verbs and certain other English words for the Early Modern English Dictionary."
Access:   Free online access through a search interface is available
Notes:    
    to the top of the page

../ Newdigate Newsletters
Developed by: Philip Hines, JR.
Size:   2100 manuscript newsletters in the Newdigate series
Contents:   "A multi-genre corpus, with a thousand-year timespan from the earliest documents to 1710."; 13 January 1673/4 to 29 September 1715
Access:   The Diachronic part is available on the ICAME CD-Rom; check out the online sample.
Notes:    
    to the top of the page

../ Penn-Helsinki Parsed Corpus of Middle English - PPCME2
Developed by: University of Pennsylvania, U.S.
Size:   1.3 million words; 55 text samples
Contents:   prose text samples of Middle English;
Access:   Both the corpus and the software 'Corpus Search' are available on CD-Rom
Notes:   syntactic annotation
    to the top of the page

../ The York-Toronto-Helsinki Parsed Corpus of Old English Prose (YCOE)
Developed by: Department of Language and Linguistic Science, University of York , UK
Size:   1.5 million words
Contents:   Details on the selected texts and their copyrights
Access:   free access for non-commercial purposes; fill out order form
Notes:   syntactically-annotated; sister corpus to the Penn-Helsinki Parsed Corpus of Middle English (PPCME2)
    to the top of the page