logo       

Re: Using MTurk for markup tasks (was Cost of part of speech tagging): msg#00160

science.linguistics.corpora

Subject: Re: Using MTurk for markup tasks (was Cost of part of speech tagging)

Alexandre Rafalovitch wrote:
An interesting approach would be to use Amazon Mechanical Turk for
this kinds of tasks.
...
Has anybody else given a thought to this?

Don't know what languages you're interested in. I have thought about "wikifying" other sorts of projects (like finding and keeping track of on-line computational resources, or building bilingual text collections) for "low density" languages. I have never actually tried this, but it may be instructive to look at the languages for which there are substantial Wikipedia and Wiktionary resources. Last time I looked, the usual suspects (the major and some "minor" European languages, plus Japanese) had at least 100k Wikipedia articles, while there was a slightly wider variety of languages with at least 10k Wikipedia articles (including Arabic (= MSA), Persian, Hebrew, Bahasa Indonesian, Korean, Malay, Thai, Turkish and Chinese). For comparison, the English Wikipedia has 1.5 million articles.

My guess is that "wikification" (including the Amazon Mechanical Turk under this) will work best for languages where there are a substantial number of speakers with idle time, sufficient income to afford the computer and network connection, and sufficient education for the specific annotation task.
--
Mike Maxwell
maxwell@xxxxxxxxxxxxxx




<Prev in Thread] Current Thread [Next in Thread>
Google Custom Search

News | FAQ | advertise