|
JRC-Acquis: a large aligned parallel corpus in 21 languages, freely availab: msg#00091science.linguistics.corpora
JRC-Acquis: a large aligned parallel corpus in 21 languages, freely available Readers on this list may be interested in the availability of the 'JRC-Acquis' parallel corpus: SIZE AND FORMAT - 21 languages (all 20 official EU languages plus Romanian) - Average corpus size: 8.8 million words per language - XML Format according to TEI P4, UTF-8-encoded - Modular: download the languages you need. LANGUAGES Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish, Swedish. TEXT TYPES - Documents on contents, principles and political objectives of the EU Treaties - EU legislation - Declarations - Resolutions - Acts - International agreements. PARAGRAPH ALIGNMENT - Paragraph-aligned for all 210 language pairs - Paragraphs are sentence parts, sentences, or groups of sentences - 2 alternative alignments: using Vanilla and HunAlign - Ca. 270,000 alignments per language pair. MANUAL SUBJECT DOMAIN CLASSIFICATION - Manually classified according to EUROVOC subject domains - Selected from 6000 hierarchically organised classes, wide-coverage. USE / DOWNLOAD - Download from http://langtech.jrc.it/JRC-Acquis.html - Usage free for research purposes. FOR MORE DETAILS Steinberger Ralf, Bruno Pouliquen, Anna Widiger, Camelia Ignat, Tomaž Erjavec, Dan Tufiş, Dániel Varga (2006). 'The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages'. Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC'2006). Genoa, Italy, 24-26 May 2006. Available at http://langtech.jrc.it/#Publications. CONTACT FOR FURTHER INFORMATION Ralf Steinberger (Ralf.Steinberger@xxxxxx) European Commission - Joint Research Centre (JRC) IPSC - SeS - Language Technology URL: http://langtech.jrc.it, http://press.jrc.it/NewsExplorer T.P. 267, Via Fermi 1 21020 Ispra (VA), Italy Tel: +39 0332 78-6271 Fax: +39 0332 78-5154 |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | Australia: The AFNLP-Nagao Fund for COLING/ACL 2006 -- 2nd Call for Applications: 00091, Timothy Baldwin |
|---|---|
| Next by Date: | CFP: VORTE 2006 - 2nd International EDOC Workshop on VOCABULARIES, ONTOLOGIES AND RULES FOR THE ENTERPRISE (Submission, June 16): 00091, Guizzardi, G. (Giancarlo) |
| Previous by Thread: | Australia: The AFNLP-Nagao Fund for COLING/ACL 2006 -- 2nd Call for Applicationsi: 00091, Timothy Baldwin |
| Next by Thread: | CFP: VORTE 2006 - 2nd International EDOC Workshop on VOCABULARIES, ONTOLOGIES AND RULES FOR THE ENTERPRISE (Submission, June 16): 00091, Guizzardi, G. (Giancarlo) |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |