|
|
| |
|
|
|
../ CRATER: Multilingual Aligned Annotated Corpus |
| Developed by: |
 |
Computing Department at Lancaster University |
| Size: |
|
~ 1 million words |
| Contents: |
|
trilingual: English, French and Spanish - telecommunications texts |
| Access: |
|
access online; also download of the text files via FTP possible |
| Notes: |
|
aligned at the sentence level; POS tagged in all three languages |
| |
|
 |
|
|
|
../ ENPC - English Norwegian Parallel Corpus |
| Developed by: |
 |
University of Oslo, Norway |
| Size: |
|
100 original texts and 100 translated texts, amounting to some 2.6 million words in all |
| Contents: |
|
fictional and non-fictional texts |
| Access: |
|
Access to the corpus is restricted to the staff and students of the University of Oslo. |
| Notes: |
|
|
| |
|
 |
|
|
|
../ ESPC - English Swedish Parallel Corpus |
| Developed by: |
 |
Bengt Altenberg; Karin Aijmer; Mikael Svensson |
| Size: |
|
64 English text samples and translations; 72 Swedish text samples and translations; total corpus size 2.8 million words |
| Contents: |
|
"With few exceptions, the samples have been taken from texts published since 1980. Most major regional varieties of English are represented (British, American, Canadian, Irish, South African) but no attempt has been made to achieve a systematic or 'representative' distribution of these. Only written texts are represented. A number of prepared speeches have been included but they have their origin in writing and do not reflect genuine speech. Other categories that are missing in the corpus are, for example, newspaper text, private letters and business correspondence." |
| Access: |
|
Restricted to researchers and students at the Universities of Lund and Göteborg |
| Notes: |
|
Corpus manual available |
| |
|
 |
|
|
|
../ IJS ELAN - Slovene-English Parallel Corpus |
| Developed by: |
 |
Dept. of Intelligent Systems, Institute Jozef Stefan |
| Size: |
|
1 million words from 15 parallel Slovene-English / English-Slovene texts |
| Contents: |
|
|
| Access: |
|
free access, the corpus can be downloaded or accessed via their online concordancer |
| Notes: |
|
"the corpus is tokenised, sentence segmented and aligned; encoded as a translation memory in SGML TEI P3" |
| |
|
 |
|
|
|
../ Oslo Multilingual Corpus |
| Developed by: |
 |
Interdisciplinary research project Languages in Contrast (SPRIK) at the University of Oslo |
| Size: |
|
The OMC is made up of more than 10 sub corpora of varying sizes. |
| Contents: |
|
"The Oslo Multilingual Corpus (OMC) is a collection of text corpora comprising original texts and translations from several languages. The various sub-copora differ in that they contain a different number of languages or a different combination of languages. The OMC provides unique research material for use in contrastive studies and translation studies, as well as in theoretical and applied linguistics." |
| Access: |
|
Access available for research purposes; application form provided online |
| Notes: |
|
|
| |
|
 |
|
|