|
Cost of POS tagging, again: msg#00167science.linguistics.corpora
Hi, Marc et al., Christopher's points are well-made. A couple of other things to think about: 1) You seem to be envisioning doing ex nihilo manual POS annotation. However, that will probably be neither practical nor desirable; rather, you're likely to want to do the initial annotation automatically, and then manually curate the output of the initial, automatically-generated annotation step. 2) You actually may not want to directly curate the POS tagging at all. Rather, if you're going to do further processing--say, syntactic parsing--you might want to curate the POS tags as part or byproduct of the downstream curation. 3) Even if you do want to directly curate the POS tagging, you will probably find some efficiencies to be gained from automatic means. For example, you are more likely to need to correct a bunch of adjective/past participle distinctions (I'm assuming here that your data is English) than you are to need to correct a bunch of mis-tagged commas (although I have certainly seen lots of mis-POS-tagged commas, too!). Scripting can help you out here. Finally, Christopher is right on with suggesting hourly, rather than per-token, budgeting. Hope this is helpful, Kevin -- K. B. Cohen Biomedical Text Mining Group Lead Center for Computational Pharmacology 303-916-2417 (cell) 303-377-9194 (home) http://compbio.uchsc.edu/Hunter_lab/Cohen |
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| Previous by Date: | corpua of academic articles: 00167, Tina Waldman |
|---|---|
| Next by Date: | Deadline Extension - Special Issue of Lingvisticae Investigationes on Named Entities: 00167, Elisabete Marques Ranchhod |
| Previous by Thread: | corpua of academic articlesi: 00167, Tina Waldman |
| Next by Thread: | Call for Abstracts, DSNA Chicago 2007: 00167, Erin McKean |
| Indexes: | [Date] [Thread] [Top] [All Lists] |
| News | FAQ | advertise |