|
|
| ../ BoLC - The Bononia Legal Corpus |
| Developed by: |
 |
Centre of Theoretical and Applied Linguistics 'L. Heilmann' (Rema Rossini Favretti, Fabio Tamburini) |
| Size: |
|
Italian subcorpus: 33.5 million words / English subcorpus: 21 million words |
| Contents: |
|
The Bononia Legal Corpus is a multilingual comparable legal corpus |
| Access: |
|
Online corpus query for authorized users only. |
| Notes: |
|
|
| |
|
 |
|
|
| ../ Corpus di Italiano Scritto - CORIS |
| Developed by: |
 |
CILTA - Centro Interfacoltà di Linguistica Teorica e Applicata |
| Size: |
|
approx. 100 million words |
| Contents: |
|
"CORIS contains 100 million words and will be updated every two years by means of a built-in monitor corpus. It consists of a collection of authentic and commonly occurring texts in electronic format chosen by virtue of their representativeness of modern Italian." |
| Access: |
|
The Corpus Query Form is available to authorised users only. You can request access for research purposes |
| Notes: |
|
There is also a DEMO version available. |
| |
|
 |
|
|
| ../ CRATOR Multilingual Aligned Annotated Corpus |
| Developed by: |
 |
Lancaster University (Computing Department) |
| Size: |
|
approx. 1 million words |
| Contents: |
|
Trilingual: English, French and Spanish - telecommunications texts |
| Access: |
|
access online; also download of the text files via FTP possible |
| Notes: |
|
aligned at sentence level; POS tagged in all three languages |
| |
|
 |
|
|
| ../ Czech National Corpus - CNC |
| Developed by: |
 |
Institute of the Czech National Corpus (Faculty of Arts, Charles University, Prague) |
| Size: |
|
Synchronous (written): 100 million words (SYN2000)
Synchronous (spoken): 800K words (Prague Spoken Corpus)
Diachronous: The bank of diachronis Czech contains 2 million words transcribed texts, 100K transliterated texts and 200K dialect texts |
| Contents: |
|
"The aim of the project Czech National Corpus is to build up the Czech language corpora and, subsequently, to retrieve information from them. The CNC consists of two main parts: synchronous and diachronic." |
| Access: |
|
The SYN2000 is available free of charge after registration; The 20 million words SYN2000 PUBLIC can be searched online. |
| Notes: |
|
|
| |
|
 |
|
|
| |
| ../ ET 10-63 Corpus (bilingual; parallel) |
| Developed by: |
 |
|
| Size: |
|
1.25 million words of each language |
| Contents: |
|
English and French official documents on telecommunications |
| Access: |
|
|
| Notes: |
|
POS tagged and lemmatized |
| |
|
 |
|
|
| ../ French Corpus |
| Developed by: |
 |
Cambridge University Press/Cornell University |
| Size: |
|
35,303 words |
| Contents: |
|
"The French corpus is currently comprised of 51 hours of spoken French recorded in Paris, Grenoble, Monpellier and Avignon." (Late 1990s) |
| Access: |
|
|
| Notes: |
|
|
| |
|
 |
|
|
| |
| ../ Hungarian National Corpus - HNC |
| Developed by: |
 |
Department of Corpus Linguistics of the Research Institute for Linguistics of the Hungarian Academy of Sciences (HAS) |
| Size: |
|
187.6 million words (07/2006) |
| Contents: |
|
The Hungarian National Corpus is divided into five subcorpora (text genres/regional variety) |
| Access: |
|
Free of charge (Registration required) |
| Notes: |
|
This is a balanced reference corpus of written contemporary Hungarian |
| |
|
 |
|
|
| ../ Korpus 2000: Danish Corpus Project |
| Developed by: |
 |
The Society for Danish Language and Literature |
| Size: |
|
approx. 28 mio. words |
| Contents: |
|
Various texts written from 1998 to 2002 |
| Access: |
|
freely available to the public |
| Notes: |
|
"It is also possible to search the Korpus 90 (1988-1992) which is similar to the Korpus 2000 in its composition and size and hence serves as an older comparative corpus for the Korpus 2000." |
| |
|
 |
|
|
| ../ National Corpus of Irish - Corpas Náisiúnta na Gaeilge |
| Developed by: |
 |
Institiúid Teangeolaíochta Éireann |
| Size: |
|
approx. 30 million words (8 million SGML tagged) |
| Contents: |
|
Contemporary books, newspapers, periodicals and dialogue |
| Access: |
|
Corpus available for purchase (€ 50 for research purposes) |
| Notes: |
|
|
| |
|
 |
|
|
| ../ Oslo Corpus of Bosnian Texts |
| Developed by: |
 |
IMS - Institut fuer Maschinelle Sprachverarbeitung |
| Size: |
|
1.5 million words |
| Contents: |
|
"[It] comprises several different genres: fiction (novels and short stories), essays, children's stories, folklore, islamic texts, legal texts, and newspapers and journals. The texts, written by authors from Bosnia and Herzegovina, have for the most part been published in the 1990s." |
| Access: |
|
Freely available for non-commercial academic research. |
| Notes: |
|
|
| |
|
 |
|
|
|
|
| ../ Slovak National Corpus |
| Developed by: |
 |
Jazykovedný ústav Ľ. Štúra SAV |
| Size: |
|
294 087 581 tokens |
| Contents: |
|
60.6% journalistic texts, 17.5% fiction, 11.6% specialized texts, 10.3% others (contemporary Slovak language texts) |
| Access: |
|
Access free of charge available: WWW Interface or full access for research (registration required) |
| Notes: |
|
|
| |
|
 |
|
|
| ../ Uppsala Corpus of Russian |
| Developed by: |
 |
Slaviska Institutionen, Uppsala Universitet |
| Size: |
|
~ 1 million words |
| Contents: |
|
"600 Russian texts with a total of one million running words (word tokens), equally divided between informative and literary prose (1960-1989)." |
| Access: |
|
online search access (cyrillic or latin transliteration) at Tübingen |
| Notes: |
|
|
| |
|
 |
|
|